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Abstract 



' We demonstrate a family of propositional formulas in conjunctive normal form so that a for- 

■ mula of size N requires size 2^(vW'o9^) refute using the tree-like OBDD refutation system 
[/3 I of Atserias, Kolaitis and Vardi [3] with respect to all variable orderings. All known symbolic 
O ■ quantifier elimination algorithms for satisfiability generate tree-like proofs when run on unsat- 

isfiable CNFs, so this lower bound applies to the run-times of these algorithms. Furthermore, 
the lower bound generalizes earlier results on OBDD-based proofs of unsatisfiability in that 
I it applies for all variable orderings, it applies when the clauses are processed according to an 

■ arbitrary schedule, and it applies when variables are eliminated via quantification. 

o 

(3 ■ 1 Introduction 
l> ■ 

' Ordered binary decision diagrams (OBDDs) are data structures for representing Boolean func- 

■ tions [HI \7\ El] that are widely used when solving problems in circuit synthesis and model checking 
(cf. [H [3 [301 [13]). A large number of OBDD-based algorithms have been implemented for solving 
the Boolean satishability problem P[42l[l8l[ini[IIl[Il[Ml[33l[2l[l3[Ml[22l[31[2l] Many of these 

^ ■ algorithms are known to efficiently generate proofs of unsatisfiability for CNFs known to require 

exponential running times for other methods, such as the pigeonhole principle that states n + 1 
objects cannot be placed into n holes without a collision, and it is not immediately clear what the 
limitations of OBDD-based methods are. While it would immediately follow from the hypothesis 
P 7^ NP that such methods cannot solve all satisfiability instances in time polynomially-bounded 
by the input size, that sort of thinking strikes us as begging the question, and here we present 
unconditional limitations for algorithms of this kind: We unconditionally show that a wide class 
of OBDD-based satisfiability algorithms cannot solve all satisfiability instances in sub-exponential 
time. Prior analyses of the runtimes of OBDD-based satisfiability methods have been limited in 
their application because of assumptions on the order of processing the input clauses [201 [H] oi 
an assumption on the variable ordering used when building the OBDDs [3j, so this is the first 
unconditional lower bound that applies even to a system that explicitly constructs the OBDD for 
a CNF by selecting a variable ordering and then conjoining the clauses according to a heuristically 
chosen order. 
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More formally, we present super polynomial size lower bounds for the tree-like OBDD refutation 
system and satisfiability algorithms based on explicit OBDD construction and symbolic quantifier 
elimination. We give two motivations for studying minimum refutation sizes for proof systems and 
satisfiability algorithms. The first is that it is a necessary and tractable step towards understanding 
larger questions: Whether or not there is a polynomial-time algorithm for satisfiability, and whether 
or not propositional proof systems manipulating Boolean circuits can prove every tautology in size 
bounded by a polynomial in the size of the tautology (formalized as whether or the extended- 
Frege proof systems are polynomially bounded, cf. [26]). Both of these problems seem well beyond 
our current understanding. Rather than try to understand all polynomial-time computations or 
all extended-Frege proofs, we study the sizes of proofs of unsatisfiability for a particular class of 
satisfiability algorithms and extended-Frege proofs: In this case, tree-like OBDD refutations. Under 
this interpretation, the main result of this paper can be interpreted as saying "As far as symbolic 
quantifier elimination algorithms are concerned, P is different from NP." The second motivation 
is to develop taxonomy of satisfiability methods and identify the kinds of reasoning best suited 
to each method. Under this interpretation, the main result of this paper can be interpreted as 
saying "While symbolic quantifier elimination methods can perform efficiently on some structured 
formulas such as the n -|- 1 to n pigeonhole principle, such methods inherently face an exponential 
blow-up when reasoning about the behavior of a system acted upon by a permutation." 

1.1 Using OBBDs for Satisfiability and Propositional Proofs 

One motivation for developing satisfiability algorithms based on OBDDs is the hope to escape the 
limitations of the resolution proof system. Most current satisfiability engines, in particular, the 
DLL with clause learning approach |29l l32| [TTl [TB] , implement the resolution proof system [40J and 
therefore require exponential running times on the many CNFs known to require exponential size 
resolution refutations [211 dSj [121 [3 [33 [1] • The hope is that by developing algorithms that imple- 
ment proof systems other than resolution, new satisfiability algorithms will be able to efficiently 
solve satisfiability instances not yet efficiently solvable. 

An OBDD is a read-once branching program in which the variables appear according to a fixed 
order along every path (ie. the nodes are arranged in levels, all nodes at a level query the same 
variable, and each variable corresponds to at most one level). The choice of variable ordering can 
affect the size of the OBDD by an exponential factor and choosing a suitable variable ordering for 
a task is of utmost importance. The primary utility of the ordering restriction is that with respect 
to each fixed ordering, the OBDD computing a Boolean function is unique, up to a linear-time 
reduction to normal form (cf. [3T]). Because of this canonicity property, the equality test for two 
Boolean functions represented as OBDDs is simply a check that their OBDDs are identical. Many 
simple but useful functions have small OBDDs with respect to some variable ordering, and many set 
operations, such as union and intersection, can be computed in polynomial time from two OBDDs. 
These properties make OBDDs well-suited for reasoning about symbolically encoded sets of states, 
and their use revolutionized the field of model checking [30^13]. In light of this success, a number of 
attempts have been made to utilize OBDDs for more efficient satisfiability algorithms. This results 
of this paper apply to two such methods, explicit construction and symbolic quantifier elimination, 
but do not clearly apply to a third, compressed resolution. 

Explicit construction. In the literature, this is sometimes called the "OBDD apply" method. 
In this method, a variable ordering is selected, the OBDD for the CNF with respect to that ordering 
is constructed, and it is checked whether this OBDD is the constant false [6]. Proofs in this system 
are straightforward: We begin with the OBDDs representing each clause, and we repeatedly conjoin 
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them together until we obtain an OBDD for the conjunction of all the clauses. There are two 
opportunities for cleverness - the variable ordering used to construct the OBDDs, and the order in 
which the clauses are joined together, cf. |42t [Tl[22j. Empirical studies [42\ I14j and a mathematical 
analysis of the implementation in which the clauses are conjoined in the same order as the input 
presentation [20] have suggested that this method is incomparable with resolution-based methods. 

Symbolic quantifier elimination. This method extends the explicit construction method by 
strategically eliminating variables via the application of existential quantifiers [18\ [H [36l [22l . 
In particular, to determine if a CNF /\^iCi{x) is satisfiable, rather than build an OBDD for 
KiLiCi{x), it suffices to build one for 3x /\^-^Ci{x). This is can be more efficient because it is 
often the case that the OBDD for 3xF(x,y) are significantly smaller than the OBDD for F{x,y). 
One example of this approach is to first use heuristic methods to partition the variables into sets 
Xi, . . . Xk and the clauses into sets ^i, . . . so that for each i = 1, . . . k, the variables of Xi do not 
appear in the clauses belonging to sets Ai^i, . . . A^, then construct the OBDD for the quantified 
Boolean formula: 



It has been observed that symbolic quantifier elimination leads to significant speed-ups over explicit 
OBDD construction on random 3-CNFs [181 |l], and that, on a certain mix of structured bench- 
marks, symbolic quantifier elimination solves more instances before time-out than solvers based on 
resolution or compressed resolution [221 [36] . 

When formalized as proof systems, these algorithms can be viewed as treelike versions of the 
OBDD propositional proof system described by Atserias, Kolaitis and Vardi [3j. This proof system is 
highly non-trivial: OBDDs are circuits not formulas, so this proof system is a kind of weak extended- 
Prege syste Because it is not believed possible to convert OBDDs into formulas without an 
exponential blow-up, the OBDD proof system is not expected to be p-simulatable by Prege systems 
(in the sense of Cook and Reckhow y^). The tree-like OBDD system possesses polynomial-size 
refutations of the n + 1 to n pigeonhole principle, and it can p-simulate several interesting proof 
systems, such as tree-like resolution, Gaussian refutations over a finite field, and tree-like cutting 
planes refutations with unary coefficients [3]. 

Compressed resolution and compressed search. The analysis of this paper does not apply 
to these systems in a clear way, and we take a few paragraphs to to discuss why not. Compressed 
resolution and search methods use OBDDs (or sometimes, a variant known as ZDDs or zero- 
suppressed binary decision diagrams, cf. i3lf ) to encode exponentially large resolution refutations. 
A well-known example of this method is multiresolution, developed by Chatalic and Simon [lO^lllj. 
In multiresolution, the set of clauses in the refutation is represented symbolically with a ZDD, 
and the Davis-Putnam variable elimination step is performed using ZDD operations, so that many 
resolution steps are handled simultaneously. In addition to the DP procedure, clause learning and 
breadth- first search algorithms have been implemented in the compressed setting |33[ [34] [35j . 

The reason that the lower bound of this paper does not seem to apply to "compressed proof 
systems" is that in these systems, the OBDDs are not over the same variables as the input CNP. The 

^For uninitiated, Prege systems are basically the standard textbook style systems of propositional logic manipulat- 
ing Boolean formulas whereas extended Frege systems manipulate Boolean circuits. From a computational complexity 
perspective, Frege systems can be thought of as manipulating concepts definable in NC^ and extended Frege systems 
can be thought of as manipulating concepts definable in P. 
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OBDDs symbolically encode a large resolution proof, so they work over new variables that encode 
clauses over the original variables. A typical encoding has for each literal I over original input CNF 
variables, a new variable yi that corresponds to whether or not the literal I is present in a clause. 
In this way, compressed methods are akin to the "implicit proofs" described by Krajicek [27] . 

1.2 The Result and Comparisons with Earlier Work 

The main result of this paper is that for infinitely many values of N, there is an unsatisfiable CNF $ 
of size so that every tree-like OBDD refutation of $ has size at least 2'^(V^/'°s^) (Theorem [8]). 
This lower bound generalizes earlier work on proving size lowerbounds for OBDD-based proofs of 
unsatisfiability in three ways: The proofs can use variable elimination via existential quantifiers, the 
clauses of the input CNF can be processed in any order (so long as they are recombined according 
to a tree-structure), and the variable ordering of the OBDDs can be arbitrary. The two previously 
published results regarding size lower bounds for OBDD-proofs of unsatisfiability either made use 
of a restriction on the order in which the clauses are processed, or held only for a fixed ordering on 
the variables. 

In [20] , Groote and Zantema prove a size lower bound for refutations in the OBDD-apply system 
that conjoins the clauses of the CNF in the order of the input listing (ie. to process Ci A (C2 A C3), 
an OBDD for C2 A C3 is built and then one for Ci A (C2 A C3) is built). In fact, in that paper they 
give a size lower bound for refutations of a formula of the form -ix A (x A -0)) which is trivial to 
refute if the formula is processed as (-ix Ax) Aip. Qualitatively, Theorem [8| generalizes their bound 
by applying to systems that eliminate variables by quantification, and by applying to systems that 
allow the clauses to be processed in an arbitrary manner. However, their bound is quantitatively 
stronger: Where is the size of the difficult CNF, their bound on refutation size is 2^^^^) whereas 
ours is 2^(^). 

In [3], Atserias, Kolaitis, and Vardi formalized the OBDD-based propositional proof system 
incorporating symbolic quantifier elimination, and proved that for each fixed variable ordering, 
there is a CNF of size A^ that requires size 2^"'^' to refute in the OBDD proof system using that 
particular variable ordering. The two results are incomparable. The bound of [3] applies to the 
general (DAG-like) system, whereas Theorem [8] only applies to the tree-like system. On the other 
hand. Theorem [8] shows that there is a CNF for which every refutation with respect to every variable 
ordering has nearly-exponential size. The result of [5] says that for each variable ordering, there is 
a CNF for which that ordering is a poor choice, and does not elminate the possibility that for each 
CNF there is a variable ordering for which the CNF will be refuted in (say) time linear in the size 
of the CNF. Theorem [8] eliminates this possibility for the tree-like case, which includes all known 
implementations of these algorithms. 

The analysis of Theorem [8| is the first that applies to all symbolic quantifier elimination al- 
gorithms so far developed [181 E IMl HSl E]. It is not hard to see upon inspection that these 
algorithms generate proofs of unsatisfiability in the tree-like OBDD system. Moreover, the results 
of [2QJ do not apply to these methods as the methods typically perform a preprocessing analysis 
that chooses the order in which clauses are combined, and the methods eliminate variables via 
existential quantification. The results of [3] do not apply to these methods because the variable 
ordering is typically selected by some static analysis of the input CNF. 
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1.3 The Technique and its Comparison with Earher Work 

The argument is a reduction: We produce a CNF so that if there is a small refutation of the CNF 
in the tree-like OBDD proof system, then there is a low-communication randomized two-player 
protocol for the set-disjointness function. The set-disjointness function is known to require high 
communication [25^ [39j. so all refutations of this CNF must be large. The reduction is obtained by 
the interpolation by a communication game technique that has been well-used in the propositional 
proof complexity community for some time now |231l3]. However, there is a wrinkle that complicates 
our return to this well-trodden path. Accounting for all possible variable orderings for the OBDDs 
corresponds to proving communication lower bounds that hold under all ways of partitioning the 
inputs, the so-called best-case partition model in communication complexity. 

The analysis takes a turn from the beaten path at how the reduction fares under this best- 
case partitioning of variables. Indeed, the reduction can be thought of a variant of the reduction 
given by Raz and Wigderson ^38j in which an adversarial partitioning of the variables has taken 
place. The reductions in [38^ [23l |3] show that there is a search problem in variables U and V, 
Search{U ,V), and a randomized one-sided-error reduction from set-disjointness (in variables X 
and Y) to Search{U , V) in which player I creates an assignment to U using X and player II creates 
an assignment to V using Y. These reductions make heavy use of the structure inherent in the 
fixed partition of the variables of the search problem. In the best-case partition scenario that our 
reduction handles, we provide a search problem Search{W) and show that no matter how the 
variables of W are partitioned into two equal-sized sets U and V, there is a reduction from set- 
disjointness to the search problem in which player I to creates an assignment to U using X and 
player II to creates an assignment to V using Y. 

Over the course of the analyzing the randomized reduction, in particular, its distribution on 
placing gadgets, we develop a framework for passing local density results that hold for the uniform 
distribution to hold for distributions that we say are "generated by dependent domains with block- 
ing processes" . While these techniques are quite simple, they may be of interest for analyzing other 
random processes and reductions that exploit structure in dense graphs or set systems. 

1.4 Outhne of this Article 

Sections [2] and [3] are notation and background. The CNF that we prove difficult for OBDD refu- 
tations is introduced in Section HI Because of the central role of handling the partition of the 
variables. Section [5] is dedicated to the bookkeeping involved with handling partitions and defin- 
ing the density of a partition, which is the parameter governing the quality of the reduction from 
set-disjointness. 

We present the reduction and its analysis in an order that emphasizes the similarities with the 
reductions of [23] and [38], while encapsulating the differences in some lemmas that are proved in 
later sections. Section [6] includes the standard argument that a small treelike refutation yields a low- 
communication search protocol, although some work is needed to guarantee that the search protocol 
works for a partition of density 0(1). Section [7| details the reduction proves the lower bound, modulo 
a lemma about the distribution on the gadgets used to build the reduction. Lemma [H The marquee 
lower bound is presented in Subsection 17.11 Theorem [8l 

In Section [HI we construct the objects claimed in Lemma [H The distribution is very far from 
uniform, and this makes the analysis quite different from that of [38] • However, to make the 
reduction work, we need only two properties to hold. The first is that the probabilities assigned to 
objects at Hamming distance f2(l) differ by at most a constant factor (encapsulated as Lemma [T3| 
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the "continuity lemma" ) , and the second is that events ensuring correctness of the reduction occur 
with probabihty not-too-much-less than they would under the uniform distribution (encapsulated 
as Lemma [T2l the "completeness lemma" ) . Because the reduction is based on randomly flinging 
gadgets into the dense corners of a graph, the distributions get messy and it seems wise to pass 
to a cleaner framework as soon as possible. We call this framework distributions from dependent 
domains with blocking processes, or DDWB distributions. Section [10] lays out the notation used for 
the probability calculations and states some simple calculations that are needed, while Section [TT] is 
devoted to DDWB distributions and their properties. In Section [T2t we show that the distribution of 
Lemma [6] is a DDWB distribution and use this to prove the continuity lemma and the completeness 
lemma, which guarantee the correctness of the reduction. 

1.5 Open Questions 

The main question left open by this paper is to increase the constants for Theorem [8l The constant 
hidden in the Q() of the 2^(V^/'°s^) lowerbound of Theorem [8] is extremely small. Not logician 
small, but somewhere above Ramsey theorist small and way below computer scientist small. It is 
well below 2"^'^''. It is doubtful that this is strongest refutation-size lower bound that holds for the 
system, even for these particular CNFs. 

The next question is whether or not we can go from the tree-like to the DAG-like case, ie. can 
a superpolynomial size lower bound be proved for DAG-like OBDD refutations of some family of 
CNFs? This would fully resolve the question posed in [3j. 

What can be said about the expected size of a (tree-like) OBDD refutation of a random 3-CNF? 
This is open even for the explicit OBDD construction method. It would be especially interesting if 
such an analysis could explain some of the threshold behavior observed in |14l [T]. 

It is common for OBDD packages to include a feature that dynamically recomputes the variable 
ordering when the OBDDs grow too large. The analysis of Theorem [8] does not cover this as the 
conversion from refutation to search (Lemma [3]) seems to depends on every OBDD in a derivation 
using the same variable ordering. Current work with symbolic quantifier elimination algorithms 
for satisfiability has suggested that, given current technology, static variable orderings generally 
lead to better performance than dynamic variable orderings [H [22]. This may be because these 
studies compare a default dynamic reordering heuristic against a static order that is customized 
for the satisfiability problem. A dynamic variable reordering method that consistently outperforms 
static methods remains unseen. On the other hand, there is no explanation of why static orderings 
should perform just as well as dynamic orderings. An interesting extension of this work would be 
to find a proof system that formalizes OBDD-proofs that include dynamic variable reordering and 
to use this to formally compare methods that use dynamic reordering with those that use static 
variable orderings. And of course, proving unconditional proof size lower bounds for algorithms 
that incorporate dynamic variable reordering would be interesting. 

To the best of our knowledge, no non-trivial size lower bounds are known for any of the com- 
pressed methods [TOl [HI [33l [Ml [35]. Because these systems work with OBDDs, there is a similar 
flavor with the systems studied in this article. However, the fact the systems build OBDDs in 
different variables than those of the input CNF prevents an immediate application of Theorem [8] 
to these systems. 
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2 Notation and Communication Complexity Background 

Definition 2.1 The real numbers are denoted by M and [0,1] denotes the closed unit interval. Let 
n be an integer. The set of integers {!,... n} is denoted by [n]. For a set S and a non-negative 
integer k, the set of all k-tuples over S is denoted by and the of all size k subsets of S is denoted 
by (^). For a set S we letxs denote the indicator function for S withxs{o) = ^ if o- ^ S , xsio-) = ^ 
is a ^ S. The domain of xs will always clear from context. For a product space Hie/ "^^^ where I 
is a finite set, we will sometimes say that the product space is "|/| dimensional" even though is no 
algebraic structure defined on Hie/^j- 

Note that (f^') is a set with 

Definition 2.2 We use the word ''graph" to mean a simple, loopless undirected graph. We use C 
to denote the (not necessarily induced) subgraph relation, ie. G ^ H if G = iy^E) and H = iyV,F) 
with V C. W and E C F (as sets). For any two disjoint nonempty sets A and B, we write 
K{A,B) to denote the complete bipartite graph with partition {A,B}. Let G = (V,E) be a graph. 
Let Vo C y and let Eq C E. The set of edges Eq restricted to Vq, written Eq [Vq], is defined as 
Eo[Vo]={e£Eo \eCVo}. 

We use standard results on the randomized two-party communication complexity of the set- 
disjointness function. For a more thorough introduction to this subject, consult [28j. 

Definition 2.3 Let f{X,Y) be a function. A randomized two-player protocol for f is a two-party 
communication protocol in which Player I has private access to X , Player II has private access 
to Y , and the players share access to a source of random bits, so that for all inputs X and Y , 
with probability at least 2/3, the players agree upon the correct value of f{X ,Y). A deterministic 
protocol is one in which the answer arrived at by the players is independent of any randomness 
and is uniquely determined by the input X,Y. The cost of a protocol is the maximum number 
of bits communicated between the two players taken over settings of the input and the random 
bits. The randomized communication complexity of / is the minimum cost of a randomized two- 
player protocol that computes f. The set-disjointness function on n bits is a Boolean function 
setdisjn : {0, 1}" x {0, 1}" ^ {0, 1} with 

setd^sHX Y) = I ^ ^/ 3i G [n], X, = = 1 
I otherwise 

Theorem 1 (125{ \39^ . cf. 128^ ) The two-party randomized communication complexity of setdisjn 
is Q{n). 




7 



3 The Ordered-Binary Decision Diagrams Refutation System 



Definition 3.1 (cf. \ 31^) A binary decision diagram (also known as a branching program^ is a 
rooted, directed acyclic graph in which every nonterminal node u labeled by a variable Xu and has 
two out- arcs, one two a node t^ and the other to a node fu- Sinks are labeled by Boolean values. 
The function represented by a branching program is calculated by starting at the root and following a 
path to the sink as follows: If the current node u is labeled by the variable Xu, and Xu is assigned the 
value true, then follow the arc t^, otherwise follow the arc labeled /„. The value that the function 
takes is the value labeled on the sink. The size of a binary decision diagram is its number of nodes 
as a DAG. An ordered binary decision diagram (OBDD) is a binary decision diagram in which: 
Along every path from the source to a sink, every variable is queried at most once, and, there is 
fixed ordering of the variables ^ so that along all paths from the source to a sink, the order in which 
variables are queried is consistent with ^. 

For the purposes of our argument, we do not care if the OBDDs are reduced to canonical normal 
form. Indeed, all that is actually used about OBDDs is a simple connection between OBDDs and 
communication complexity that is the starting point for our reduction. We do not use it explicitly 
in this article, however, it is an ingredient for the proof of Lemma HI 

Proposition: If there is size S OBDD for a function /(xi, . . . Xn) with respect to some variable 
order Xj^ , . . . Xi^ , then for each k £ [n], there is a two-party communication protocol computing / 
with respect to the variable partition {xj^, . . . Xj^.}, {xij.+i, . . . Xj„} that uses [log S~\ many bits of 
communication. 

Proof sketch: The first player broadcasts the index of the node that is reached in the OBDD 
after following the path given by the assignment to {xj^ , . . . Xj^,}. The second player continues com- 
putation from this node, using the values {xj^+i, . . . Xj^}. No further communication is necessary 
because of the ordering on queries. ■ 

It is easy to see that the size of the OBDD representing a clause is no more than the size of 
the clause, plus the two sink nodes for "true" and "false". For this reason, we do not distinguish 
between a clause and its OBDD with respect to some order. 

Proposition: Let C be a clause containing / literals. For every variable ordering, C can be 
represented by an OBDD of size at most 1 + 2. 



Definition 3.2 Let C be a set of clauses in variables from a set V . A OBDD derivation from C 
with respect to a variable ordering ^ on y is a sequence of OBDDs Fi, Fm so that each 

OBDD is built from the variables ofV with respect to the order ^, and each Fi either is a clause 
in C, or follows from the preceding Fi, . . . -Fj-i by an application of one of the following inference 
rules: (A, Aq, and B are OBDDs in the variables V with ordering <, where A ^ Aq as Boolean 
functions, and x, y, z are tuples of variables from V): 

, ,. ^ n ■ i- B{y,z) . A{x,y) 

bubsumption: — — Conjunction: Projection: 



Aq ' A{x,y) AB{y,z) ' 3xA{x,y) 

For a set of clauses C, an OBDD refutation of C is a derivation from C whose final line is the 
OBDD "false". The size of an OBDD refutation is the sum of the sizes of its OBDDs. An OBDD 
derivation Fi, . . . Fm is said to be treelike if each Fi is used at most once as an antecedent to an 
inference. 
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It is easily checked that the symbolic quantifier elimination algorithms for satisfiability all generate 
treelike OBDD refutations in the above system when run on unsatisfiable CNFs [18^ [H [22| [36 ] (so 
long as a dynamic variable reordering package is not in use). 

The lower bound of Theorem [8] actually pertains to many different formulations of the tree- like 
OBDD refutation system. In particular, most sensible inference rules and axioms can be added and 
the lower bound will still apply. This is because the conversion from refutation to search protocols 
(cf. [23[ [3]) requires only that (1) the refutation structure is tree-like (2) the OBDDs are in the 
same variables as the input CNF (3) the OBDDs are each built according to the same variable 
ordering, and (4) the inference rules are sound and of fan-in at most two. Lemma [2] of the current 
work requires that the proof structure is preserved under under simultaneous permutations of the 
variables (such a substitution does change the variable ordering ^, however). 

4 The Difficult CNF: Indirect Matching Principles 

The CNF IndMatchm is a propositional encoding of the fact that in a graph on 3m vertices, it 
is impossible to simultaneously have a perfect matching on 2m vertices and an independent set of 
size 2m + 1. It is similar to CNF Matchm used by Impagliazzo, Pitassi, and Urquhart to prove 
size lower bounds for the tree-like cutting planes system [23]. However, in order to prove the CNFs 
difficult for tree-like OBDD refutations with respect to any variable ordering, we introduce a level 
of indirection via permutations. 

4.1 The CNF Matchm 

There are two distinct kinds of variable used in the CNF Matchm'- 

1. The edge variables. There are are m- ('^^) many variables used to specify the matching: One 
variable x\ for each i = l,...m and each e S [3m] ^. The intended semantics is that the 
variable x\ is equal to one if and only if the edge e is the i'th edge of the matching. 

2. The vertex variables. There are (2m + l)3m = 6m^ + 3m many variables used to specify 
the independent set: One variable for each j = 1, . . . 2m + 1 and each A; = 1, . . . 3m. The 
intended semantics is that the variable is equal to one if and only if the element k is the 
j'th element of the independent set. 

The set of all these variables is MVarSm- The following clauses form the CNF Matchm'- 

1. (At least m edges in the matching.) For each i G [m]: Vee[3m]2 x\ 

2. (Edges form a matching.) For each i,j G [2m] with i ^ j and each e, / G [3m]^ with ePl/ 7^ 0: 

3. (At least 2m + 1 vertices in the independent set.) For each j G [2m + 1]: VwelSm] 

4. (Vertices in the independent set are distinct.) For each i,j G [2m + 1] with i ^ j and each 
u G [3m]: V -^yi 

5. (The vertices are independent.) For each e G [3m]^ with e = {n, u}, each k G [m] and each 
i, j G [2m + 1]: V -^yi V 

Notice that the CNF Matchm has size O(m^). 
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4.2 The CNF IndMatch. 



The difference between tlie CNF IndMatchm and the CNF Match m is that we add variables 
specifying a permutation tt, and for an assignment A to MVarSm, we interpret the independent 
set not as {u \ 3j G [2m + 1], A{yi) = 1} but instead as {vr(u) | 3j G [2m + 1], A{yi) = 1}. 

Definition 4.1 Xei iV be given. A set H of permutations of N is said to be pairwise independent 
if for all a, b,c,d G [N] with a ^ b and c ^ d: 

Pr^en [vr(a) = c A 7r{b) = d] = 

It is well-known that for any finite field, the set of mappings {x ax + b \ a £ ¥* , 5 G F} is a 
pairwise independent family of permutations of size |F|(|F| — 1). 

Proposition: Whenever m is a power of 3, there is a pairwise-independent family of permutations 
of [3m], Urn, with 111^1 = 9m^ — 3m. 

The variables used in the CNF IndMatchm are the variables used in Matchm, along with 
new variables for encoding a permutation: There are I = [log(|n|)] many variables that encode 
a permutation from 11: The intended semantics is that the variables zi,...zi encode 

the permutations of IT in some surjective fashion. This set of permutation variables is denoted 
PVarSm- The set of variables IMVarSm is MVarSm U PVarSm- The CNF IndMatch^ has the 
same clauses of type [H type [2l type [3] and type H] that Matchm has, whereas the clauses enforcing 
independence are as follows: 

(Independence between vertices after application of the permutation.) For each ai, . . .ai G {0, 1}, 
each e G [3m]^ with e = {u,v}, each k G [m] and each i,j G [2m + 1], with it denoting the 
element of 11 encoded by a: Vi^i ^l''^^ V V "■y^^^^ V -iXg 

Notice that the CNF IndMatchm has 0{m'^) many clauses, and size 0(m-'' log m). 

Definition 4.2 Let tt be a permutation of [3m]. For each variable v G MVarSm we define 

if V = yii for some j G [2m + 1], u G [3m] 
if V = x\ for some i G [m], e G ('^™') 

Lemma 2 Let vr G 11 6e fixed. If T is a size S refutation of IndMatchm with variable or- 
dering vi,...vj\f, then there is a size S refutation of Matchm that uses the variable ordering 
it{vi), . . . tt{vn). 

Proof: Let a be the assignment to zthat selects the permutation tt~^. We apply the restriction a 
to r, and we see that the clauses of IndMatchm that that are not satisfied are the non- independence 
clauses that do not use any z variables (ie. all clauses of type[Tl type [21 type[3l and type|l|), and the 
independence clauses of the form V V -<x^, for i, j G [2m + 1], n, u G [3m], k G [m], 

and e G {^^f). We now replace every occurrence of the variable by yl^^^^y For the variable 
ordering, this means that takes the place of y^^^^ in the ordering. In each OBDD, each query to 
y^ is replaced by a query to y^(„)- Every OBDD is now constructed according to the query order 
7t{vi), . . . TT{v]\f). It is easily checked that the proof structure is preserved under this substitution 




10 



so that the new derivation is a derivation with respect to the order Tr{vi), . . . tt^vn) in the sense of 
Definition 13.21 Moreover, each clause V V -<x'^, becomes ->yl^ V -lyi V -<x^, so that 

the new refutation is a refutation of Matchm- ■ 



5 Variable Partitions and Their Densities 

The central task in the proof of Theorem [5] is to generate reductions from set-disjointness to the 
false-clause-search of IndMatchm, given an arbitrary partitioning of the variables IMVarSm- In 
this brief subsection we present the machinery for analyzing these partitions. We view the partition 
as splitting the players into an edge player, with access to variables in V/, and a vertex player, with 
access to variables in V//. In the reduction, the edge player will place his set disjointness variables 
Xi on edge variables and the vertex player will place his set-disjointness variables Yi on vertex 
variables yi. 

Definition 5.1 Let m be a positive integer, and let {Vi,Vii) be a partition of MVarSm- For each 
i = 1, . . . m, define Ei{Vi) to be {e £ [3m]^ | x], E Vj}. For each j = 1, . . . 2m + 1, define Vj{Vn) 
to be {u £ [3m] \ yi G V//}. Except for in the proof of Lemma\^ we do not discuss more than one 
variable partition at a time, so we usually write Ei instead of EiiVi) and Vj instead ofVj{Vii). 

It is helpful to think of the variables of MVarSm as being organized into m rows of edge variables 
and 2m + 1 rows of vertex variables, with Ei being the set of edge variables in row i available to 
Player I, and Vj being the set of vertex variables in row j available to Player II. A very important 
complication is that for distinct ii,i2 G [^n-], it is possible that Ei^ ^ Ei^. This means that not 
only does the edge used in assignment matter, but the identity of the variable specifying the edge 
matters as well. The same complication is in play regarding the sets Vj^ and Vj^- Because the 
identity of the variables matters, in contrast with the reduction of [38], we must treat the objects 
seen by the players as assignments to the variables, not merely sets of vertices and edges. 

Definition 5.2 Let (V/,V//) be a partition of MVarSm- The density of (V/,V//), (5(V/,V//), is 
defined as follows: 

^5 



6 From Refutation to Search 

We transform small refutations of the IndMatchm principles into a low-communication protocol 
for a search problem in the variables MvarSm- 

Definition 6.1 Let A be an assignment to MVarSm- We say that A is non-degenerate if it satisfies 
all of the clauses from Matchm of typeUi type\^ typeWi CLf^d type\^ (Informally, this means that 
the assignment selects m distinct edges and 2m + 1 distinct vertices.) An edge e G ('^™^) is said to 
be bad for A if e = {u,v} and there exist i,j £ [2m + l],/c G [m] with A{yl^) = 1, A{yi) = 1, and 
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Proposition: If j4 is a non-degenerate assignment to AlVarSm, then tliere exists an edge that is 
bad for A. 

Definition 6.2 Letm be a positive integer, and let {Vi,Vn) be a partition of MVarsm- The search 
problem FindBadEdgem (V/, V//) is defined as follows: 

1. Player I has private access to the variables ofVi. 

2. Player II has private access to the variables ofVij. 

3. Given a non- degenerate assignment A to MVarSm, the players must find a bad edge of A. 

The partition (V/, Vu) of MVarSm will play an important role in the quality of the reduction 
from set-disjointness. We will see that the larger the density of the partition, the larger the 
instances of set-disjointness that can be reduced to FindBadEdgemiVi,Vii). In particular, when 
6{Vi,Vii) = FindBadEdgem{Vi ,Vn) requires communication i^{m). 

Lemma 3 There a exists a constant c > so that for all m > 84651, if there is a size S 
tree-like OBDD refutation of IndMatchm then there is a partition iVi^Vii) of MVarSm so that 
5{Vi,Vii) > 2~^^ and there exists a deterministic two-player protocol for the search problem 
FindBadEdgem (V/, V//) that uses at most clog 5 many bits of communication. 

6.1 The Proof of Lemma [3] 

The following lemma follows from standard arguments. 

Lemma 4 (cf. 123[ [^) There exists a constant c > so that for allm, and every partition (V/, V//) 
of MVarSm, if there is treelike OBDD refutation of Matchm of size S that uses a variable order 
in which either every variable of V/ precedes every variable of Vu, or vice-versa, then for each 
i E [n], then there is a deterministic two-player protocol for FindBadEdgem (V/, V//) that uses at 
most c log 5 many bit of communication. 

Lemma 5 For m > 84651, if there exists size S refutation of IndMatchm, then there exists a 
partition of MVarSm, with 6{Vi,Vii) > 2~^'^, and a size S refutation of Matchm in 

which every variable of V/ precedes every variable of Vu, or vice-versa. 

Proof: Let vi,. . .vm be the variable ordering of IMVarSm used by the refutation of IndMatchm- 
Let io be the first position to split either the set of vertex variables or the set of edge variables 
in half. More formally, for each i = 1,...N, let vvars{i) be the number of vertex variables in 
{vi, . . . Vi}, let evars{i) be the number of edge variables in {vi, . . . Vi}, and let iQ least integer with 
either evars{io) > ^ ' (^™) or vvars{io) > "^"^^^ . 3m. Notice that there are two possible cases: 
The first is that evars{io) > y • (^^) so that {vi, . . . Vi^} contains exactly y • (^^) many edge 
variables and {fjp+i, . . . v^} contains at least ^ • {6m'^ + 3m) many vertex variables. The second 
is that vvars{iQ) > ^H^ti . 3^, so that {vi, ...Vi^} contains exactly ^ • (6m^ + 3m) many vertex 
variables and {fjp+i, . . .vn} contains at least y • (^^) many edge variables. In the first case, we 
set Vi = {vi, . . . Vi^^} and Vn = {vi^+i, . . . vn}- In the second case, we set Vu = {vi, . . .Vi^} and 
Vi = {vig+i, . ..vn}- In either case, ^ Yl'iLi l^il > ^{^2') 2^+1 Y^'i=i'^ l^il ^ Therefore, 
by LemmallS ^ Etel^]^ \Ei,nEi,\ > if™), and Ej-ep^+H^ \VnnVj,nV,,nVj,nV,,\ > ^. 
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We now calculate the expected value of 5(7r(V/), 7r(V//)) over vr S H. We begin by noting that 
for all ie[m], Ei{Tr{Vi)) = Ei{Vi) = Ei and for all j G [2m + 1], Vj{Ti{Vn)) = T^{Vj{Vn)) = T^{Vj). 
For each Te \2>mf , let E^ = Ei^ n Ei^ and for each j G [2m + 1]^, let = Vj^ D Vj^ D Vj^ D Vj^ D Vj^ . 
For each {u, v} € ('^^^) , by the pairwise independence of the permutations, we have that: 



PrTren [{7r(n), 7r(t;)} G E^ 



(Prvren [vr('u) = a, 7r{v) = b] + Pr^^n [vr(ti) = b, tt{v) = a}]) 
2\E7f\ \Er\ 



3m(3m - 1) (3^) 
Therefore, by linearity of expectation, we have that: 

E^en [\Et [vr {V^)] \] = [Mu)^^^)} ^ 

And thus we bound E^gn ['^('^(V/, V//))|] from below as follows: 



E, 



en 



1 

m?{2m + 



1)5 X/ X/ 



i*e[m]2 J6[2m+1]5 



E^ 



en 



1 



m^(2m + 



1)5 X/ X/ 

rfeM^ J'6[2m+l]5 



E^ 



en 



«*e[m]2 J'e[2m+1]^ 

2 



E E IE.en[|i5;.[-(^-)]l] 

«e[m]2 J'e[2m+1]5 



reH^ V 2 / je[2m+i]5 



> 



E 

2e[m]2 



(2m + 1) 



|^3m/32^ 



\E7 



[m] 2 



m2(2m + 1)^ (3m/32)(3m/32 - 1) _ m2(2m + 1)^ (3m)(3m - 32) 



m?{2in + 1)^ 
4 • (32)2 
,-12 31 



2 4 • (32)2 2 

3m\ (3m)(31) \ _ m2(2m + 1)^ /3m 
2 ) 2 y " 212 1^ 2 

3m 



31 



3m — 1 



, m^(2m + 1)^ , 
3m — 1 / V 2 



Choose a permutation vr with X^^g[m]2 Z^jepm+ijs |^?[vr(l^-)] I > 2 



-12 



31 



3m- 



-) m2(2m+l)5(3-). 



By Lemma O there is a size S refutation of Matchm that uses the variable ordering n^vi), . . . -ir{vN)- 
Notice that in this order, either every variable of tt{Vi) precedes every variable of 7r(V//), or every 
variable of vr(V//) precedes every variable of 7r(V/). By the above calculation, S(7r{Vi),Tr{Vii)) > 



-12 



31 



3m- 



J. Because m > 84651, we have < 2"", so 5{7r{Vi),7r{Vn)) > 2 



-12 



-13 



-13 



To prove Lemma [3l simply take the partition of MVarSm and the size S refutation of Matchm 
guaranteed by Lemma [5] and feed them into Lemma [H 
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7 Reduction and Lower Bound 



The correctness of the reduction from setdisjn to FindBadEdgem{Vi-, Vii) depends on the fohowing 
lemma: 

Lemma 6 (proof in Section\3^ For every (5 > 0, there exist cq, ci > so that for all m > 31(2/5)^, 
and all partitions of MVarSm, (V/,V//) with (5(V/,V//) > 5, for all n with n < com, there exists 
a set C, a distribution D on C with measure function fi, a function A : C x {0, l}" x {0, 1}" 
{0, ij^^v-ars™^ ^ function pe:C^ (I^^l) so that: 

1. For all L ^ C, {X, Y) G {0, 1}" x {0, 1}", all v G V/, A^ ^ ^{v) is determined by L and X, 
and for all v G V//, A^ ^ y (t^) determined by L and Y . 

2. For all L ^ C, all {X,Y) G {0,1}" x {0,1}", the assignment A^^^ is non-degenerate. 

3. For all (X,y) G {0,1}" x {0,1}", and all e G {^^p), if e is bad for A^^^, then e = pe{L) 
or setdisjn{X ,Y) = 1. 

4. For all {X,Y) G {0, 1}" x {0, 1}" with setdisjn{X ,Y) = 1, there exists S C £ with > 
6^/2^ so that for all A G {A^ ^ ^ | L G 5}; 

max Li(pe(L) = e \ A^ </ ^ = A, L £ S) < 1 — ci 

It is helpful to think of L G i2 as a "layout" guiding the construction of an MVarSm assignment from 
X, Y. Aj^ X y is simply the assignment constructed using layout L with set-disjointness instance 

{X,Y). Condition [T] is the requirement that the Player I can compute the value of Aj^j^^{v) 
for v G V/ without communicating with Player II, and that player II can compute Aj^j^^{v) for 
V G Vii without communication. Condition [2] guarantees that the assignment created is a valid 
instance of the FindBadEdgCmiyi, Vn) problem. The function pe can be thought of as a "planted 
bad edge" : The reduction is based on the idea of having positions with Xk = Yfc = 1 create bad 
edges. However, because the assignment is nondegenerate, there must always be some bad edge, 
even when setdisjn{X ,Y) = 0. The players knowingly create one such edge and we call this edge 
the planted edge for the layout, pe{L). Condition [3] states that when setdisjn{X ,Y) = 0, the only 
bad edge is the planted edge. Condition [5] states that when setdisjn{X ,Y) = 1, conditioned on the 
layout coming from the set S, no assignment is overly-correlated with a particular planted edge. 

Lemma 7 For all 6 > 0, there exist Co,Ci > so that for all m > 31(2/5)^, for all partitions of 
MVarSm, (V/, V//), with 6(yi,Vii) > 5, for all n < Com, if there is a two-player deterministic 
protocol SEARCH that solves FindBadEdgemiVi,Vii) using r bits of communication, then the 
randomized communication complexity of setdisjn is < Cir. 

Proof: Let Co be the cq as in the statement of Lemma [H We give a one-sided reduction that 
never gives a wrong answer when setdisjn{X ,Y) = 0, and when setdisjn{X ,Y) = 1, it gives the 
correct answer with probability > ci6^/2^, where ci is the second constant guaranteed by Lemma [6l 
Repeating the protocol a constant number of times and returning a only if all runs produce a 
gives a protocol with correctness > 2/3. 
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1. Using public randomness, the players select a reduction layout L according to the distribution 
T) guaranteed by Lemma [6l 

2. The players run the protocol SEARCH using the assignment -^ixf ^ edge 
returned by the protocol SEARCH. 

(a) If pe{L) = e then return 0. 

(b) If pe(L) 7^ e then return 1. 

By Lemma [U Condition [U the players can compute the needed values of A^ ^ with no 
communication. By Lemma [6l Condition [2l the assignment A^ ^ ^ is non-degenerate, and is 

therefore a legal input for the problem FindBadEdgemiVj , V//). Consider the case when X and Y 
are disjoint. By Lemma[6l Condition[3l the only bad edge in A^ x y P^i^)i so the protocol returns 

0. Consider the case when X and Y are intersecting. Apply LemmaEl Condition HI and let S be the 
set guaranteed for the pair X, Y. Define the event BasB = {L£S \ SEARCH{A^ = pe{L)}. 
This is the event that the layout belongs to S and the protocol gives an erroneous answer. Let 
As = {A^jif \ L e S]. For each A e As, lei Sa = {L e S \ A^ = A] and let Ba = 
{L ^ B \ Aj^j^y). = A}. Because the protocol SEARCH is deterministic, for each A on the set 
Ba, the function L i-^ pe{L) is the constant function taking the value returned by SEARCH{A). 
Therefore, by Lemma[6l Condition [H for each A £ As, /"(-Ba) < (1 — ci)/x(5a), and so: 

M^)= ^ fi{BA)< M5a)(1-ci) = (1-ci)M5) 

A&As AeAs 

Therefore fi{S \ B) > cifi{S) > ci6^/2^. Of course, S \ B is the event that L £ S and the 
protocol gives the answer 1. 



7.1 The Lower Bound 

Theorem 8 There exists a constant C > so that for sufficiently large m, every tree-like OBDD 
refutation of IndMatchm has size at least 2*^™ . 

Proof: Apply Theorem [1] and choose > and c* > so that for every n > N , randomized 
two-player protocols for solving setdisjn require > c*n bits of communication. Let Co and Ci be 
the constants of Lemma[71 and let m be so large that m > 31(2/(2~^^))^ = 31 • 2^^^ (so that we can 
apply Lemma[7]with 6 > 2~^^), and A^ < [ComJ (so that we can apply Theorem[T]). Set n = \_Com\ . 
Let c > be the constant from Lemma [3l Let F be a tree-like OBDD refutation of IndMatchm 
of size S. Because m > 84651, we may apply Lemma [3] and choose a partition (V/,V//) so that 
6{Vi,Vii) > 2~^^ and a two-player deterministic communication protocol FindBadEdgem{Vi,Vii) 
that uses at most clog 5" bits of communication. By Lemma [3 there is a two-party randomized 
communication protocol for setdisjn on inputs from Vn that exchanges at most Ci log S bits of 
communication. Therefore, applying the communication bound for set-disjointness, Ci log S < 

c*n = c* [ComJ , and thus S < 2 m 
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Figure 1: The basic set-disjointness gadget. A bad edge corresponds to the situation when an edge 
and both of its endpoints receive the label 1. The assignment uses: x^^^^ v^} ~ ^''^ ^{m^ w^} ~ 

yi'''^ = 1, y''u^'^ = Yk, and y^^^ = -^Y^. Notice that {uk,Wk} is never a bad edge, and that {uk,Vk} 
is a bad edge if and only if X/^. = 1^ = 1. 



8 Reduction Layouts 

The reduction from set-disjointness by randomly generates "reduction layouts". A reduction layout 
is a framework for generating instances of the search problem from instances of set-disjointness, a 
collection of gadgets. We now take a moment to discuss the gadgets underlying the reduction from 
set-disjointness to the problem of finding a bad edge. 

The basic idea is to create a bad edge for each k with Xk = Yfc = 1. To do this without 
communicating, the players use the public randomness to choose Uk,Vk,Wk G [3m] with the intent 
to place {uk,Vk} in the matching if Xk = 1 and {uk,Wk} in the matching if Xk = 0, and to place 
Vk in the independent no matter what, but to include Uk if Yfc = 1 and to include Wk Yk = 0. 
Of course, we must specify which variables are used to place the gadget, and those variables must 
be available to the players under the partition. The players use the public randomness to choose 
ik G [m] with x'^^^^^^y, x'^^^^^^y G V/ (equivalently, {uk,Vk},{uk,Wk} G Ei^) and jk,i,jk,2 G H 

with yil'^ , yit,:^ , ywf,^ G V//, (equivalently, Vk G Vj^,-^ and Uk,Wk G Vj^.^). The situation resembles 
that in Figure [H with a bad edge occurring only if Xk = Yk = 1 and only then only at {uk,Vk}- 
The reduction plants one of these gadgets for each k = 1, . . . n. 

Because there are m edges in the matching and 2m + 1 vertices in the set, one more vertex 
must be placed in addition to the two associated with each set-disjointness gadget. A final gadget 
(thought of as being at position n + 1) will contain the "planted bad edge", in which three vertices 
Un+i, Vn+1, and Wn+i are all placed in the set, and the edge {un+i,Wn+i} is included. Because 
all three vertices are placed in the set, three variables yiT^^Y , uiZtl'^ ^^'^ Uw^+f ^^^^ needed with 

Un+l G Vn+1 G 1^j„+i,2, and Wn+l G 1^j„+i,3. 

The basic idea of the reduction is to randomly plant these n + 1 gadgets on disjoint variables. 
However, to ensure that the probabilities work out as claimed in Lemma [U we make use of the 
density of the partition. 

Definition 8.1 Fix a partition of MVarSm, {Vi,Vii). Set 5 = (5(V/,V//). For each i G [m] let 
Ei = Ei{Vi) and for each j G [2m + 1] let Vj = Vj{Vn). For each i G [m], let N^ii) = {(ji, j2, ja) G 
[2m + 1]3 I ji ^ j2, j2 / J3, J3 / Ji, \Ei[Vj, n Vj, n Vj,]\ > (5/3)('r)}' let 7V2(i) = {(^1,^2) G 
[2m + 1]2 I 3j3 G [2m + 1], (ji,j2, ja) G N^ii)}. Set G = {i G [m] | |iV3(i)| > ((5/12)(2m + l)^}. 
Of course, each of G, N3{-), and A^2(0 depend upon the partition (V/, V/j), but we drop that from 
notation as we will never discuss more than one partition at a time. 
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Figure 2: The set-disjointness gadget at the position with a planted bad edge. All three vertices 
Un+i, fn+i, Wn+i are placed in the set of vertices and the edge {un+i,Wn+i} is placed in the set of 
edges. The edge {un+i,Wn+i} is a bad edge. The assignment uses: x'l'^^]^^^^^^_^y = 0, = 

J^) yun+i — ^1 yvn+1 — ywn+i ~ ^■ 



Lemma 9 (Proof in Appendix, Section \^) Let 6 G [0,1] and let m be an integer > 3/6. Let 
iVi,Vn) be a partition of MVarSm with 5(yi,Vn) > 6. \G\ > (6/12) m 

Definition 8.2 Fix an integer m, a partition (V/, V//) of MVarSm-- A reduction layout (with re- 
spect to (V/, V//), of length n) is a tuple (n, . . . in+i, ji,2), • • • (jn,i,jn,2), (jn+i,i,in+i,2, jn+1,3), 
{ui,vi,wi), . . . {un+i, Vn+i , Wn+i)) from the set [m]""*"^ x ([2m + 1]^)" x ([2m + 1]'^) x ([3m]^)" with 
the following properties: 

1. The indices ii, ■ ■ ■ in+i ore distinct. 

2. The indices ji^i, ji^2, ■■■ jn,i, jn,2, jn+1,1, jn+1,2, jn+1,3 are distinct. 

3. The integers ui, . . . Un+i,vi, . . . Vn+i,wi, . . . Wn+i are distinct. 

4. For each k = 1, . . . n + 1, {uk,Vk} G Ei^ and {uk,Wk} G Ei^ . 

5. For each k = 1, . . . n + 1, Uk,Vk,Wk ^ Vj^^^ D Vj^ .^ . 

6. n„,+i,i;„+i,u;„,+i G n Vj^^^^^ n Vj^+^^^. 

7. For all A; G [n + 1], ik G G. 

8. (jn.+l,l,in+l,2, jn.+l.s) G ^^3(Wl) 

9. For k £ [n], each {jk,i,jk,2) e N2{ik)- 

The set of all reduction layouts of length n with respect to (V/, V//) is denoted >Cm,n(V/, V//). 
When m, n, and (V/,V//) are clear from context, we simply write C and call L £ C a reduction 
layout . 

When listing the elements of a reduction layout, we will abuse notataion write (T, j,u,v,w) despite 
the fact that a reduction layout is emphatically not a member of the set [m]*^"^-^ x [2m + l]2"+3 ^ 
[Sm]*^"*"^ X [Sm]*^"*"^ X [Sm]""*"^. This matters for the purpose of computing Hamming distances. 
The Hamming distance between two reduction layouts in £ is their Hamming distance as elements 
of the 3n + 3 "dimensional" product set [m]"+^ x ([2m + 1]^)" x ([2m + 1]^) x ([3m]3)""^\ In 
particular, if two reduction layouts L = (i, j,u,v,w) and L* = (T' , ,u* ,v* ,w*) differ in only that 
{uk,Vk,Wk) 7^ {u*f,,vl,wl) then they are at Hamming distance 1. 
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Definition 8.3 Fix m,n, a partition (V/,V//) of MVarSm- Let L = {i,j,u,v,w) be a reduc- 
tion layout from C, and let Xi, . . . Xn,Yi, . . .Yn be a set-disjointness instance. We define an 
assignment x y variables of MVarSm as follows: Set I = {ii, . . . in+i} ■ Set J = 

{jl,l,jl,2,- ■ ■jn,l,jn,2,jn+l,l,jn+l,2,jn+l,3}- Set V = {ui, . . . Un+1, Vi , . . . Vn+l, Wi , . . . Wn+l} ■ Let 

j3, j3 (L), be the lexicographically first assignment to the variables {x\ \ i G [m] — I, e £ [3m — V]'^} 
U{ytt I j G [2m + 1] — J, u £ [3m] — V} so that (3 defines a matching of size m — n — 1 and an 
independent set of size 2{m — n — 1). Define as follows: 



A 



L,X,Y 



(3{xi) ifi£[m]-L and e G {[3m] - Vf 

Xk if i = ik o,nd e = {nfc, u^} for some k £ [n] 

-iX/; if i = ik o,nd e = {uk,Wk} for some k G [n] 

1 ifi = in+i and e = {un+i,Wn+i} 

otherwise 



if j € [2m + 1] — j and u G [3m] — V 
if j = jj^ i and x = for some k £ [n] 
if j = jk^2 o,nd X = Uk for some k £ [n] 
if j = jk,2 o,nd X = Wk for some k £ [n] 
if j = jn+1,1 and x = Un+i 
if 3 = jn+1,2 and x = Vn+i 
if j = Jn+1,3 and x = Wn+i 
otherwise 

Notice that when both players have access to the layout L, condition [H of Definition 18.21 ensures 
that Player I can compute the assignment to all variables in V/ by only consulting his private 
set-disjointness variables, and conditions [5] and [6] similarly guarantee that Player can compute the 
assignment to all variables in V// by only consulting his private set-disjointness variables. This 
accounts for Condition [T] of Lemma El The conditions [H [2] and [3] of Definition 18.21 ensure that 
^LXY well-defined and non-degenerate. This accounts for Condition [2] of Lemma [H 

Definition 8.4 Let m and n be given. Let (V/, V//) be a variable partition for MVarSm- Let X, 
Y be a set-disjointness instance, and let L = {i, j, u, v , w) be a reduction layout from Cm,n- 

The 

planted edge for X,y,L, pe{L), is defined to be {un+i,Wn+i}. 
Condition [3] of Lemma [6] is the content of the following lemma. 

Lemma 10 (Proof in Appendix Section\^ Let L = {i,j,u,v,w) be a reduction layout. Lf e is a 
bad edge of A^ ^ ^ then e = pe{L), or, e = {ui, vi} with Xi = Yi = 1. 

9 The Distribution on Reduction Layouts 

There is a technical point that we defer until after we describe the distribution: Why the experiment 
cannot "get stuck" and find itself in a position of attempting to choose an item from an empty 
set. For n a sufficiently small constant fraction of m, this is ruled out by some calculations that 
follow the description of the experiment. In the process that generates the distribution, we use the 
following auxiliary definitions: 



A, 



1 

Yk 

1 
1 
1 
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Definition 9.1 Let E be a set of edges over [3m], and define K,i^2{E) := {{u,v^w) G [3m]^ | 
V w, {u,v} G E, {u,w} G E}. Let X be a set. For U Q X define pmx{U) : {{u,v) G X'^ \ 
{u,v} n f/ 7^ 0} and tmx{U) := {{u,v,w) G X^ \ {u,v,w} H U ^ 0}. (The mnemonic for this 
notation is "pairs over X that meet U" and "triples over X that meet U".) 

Definition 9.2 Let (V/, V//) be a variable partition for MVarSm- Let G, N^i-), and N2{-) be as 
in Definition \8.1[ The distribution V on C is given by the following experiment: 

1. For each A: = 1, . . . n + 1; Choose i^ from G\{ii, . . . ik-i}- 

2. Set J = 0. 

3. For each k = 1, . . . n: 

(a) Uniformly choose {jk,i,jk,2) from N2{ik) \ pm[2m+i]{J) 

(b) Set J := J U {ifc,i,jfe,2} 

4. Uniformly choose (in+i,i> Jn+i,2,in+i,3) from N^{in+i) \ tm]^2m+i]{J) 

5. Set J := J U {jn+l,l,jn+l,2,jn+l,3} 

6. Set V* = 0. 

7. For each k = 1, . . . n: 

(a) Uniformly choose {uk,Vk,Wk) from }Ci^2iEi^ [{^jk,i ^ ^jk,2)]) \ ^"T'MC^*)- 

(b) SetV* = V*U{uk,Vk,Wk}. 

8. Uniformly choose (n„+i, ■w^+i) from }Ci^2{Ei„^, [(^in+1,1 ^ Vj„^, ,, D V;,„+i,3)] )\*"^[3m] (^*)- 

9. Return the layout (r, j, u, v, w) . 

Proposition: For all L ^ C, fJ-{L) > 0. 

The above proposition can be checked by iteratively noting that when we condition on the 
experiment producing a prefix of L, the probability that it selects the next coordinate of L is 
non-zero. 

The results of the following lemma guarantee that when 7 is sufficiently small with respect to 
6, the experiment does not "get stuck" . The proof is in the Appendix. 

Lemma 11 Let 6 G [0, 1] and let m be an integer > 450/(5^ . Let (V/, Vn) be a partition of MVarSm 
with S(yi,Vii) > S. Let n given with 7 = For all runs of the experiment in Definition \9.^ 

and for each k = 1, . . . n: 

1. \G\{ii,...ik-i}\ > (((5/12) -7V- 

2. \N2{ik)\pm^2m+i]{J) > ((V3) - 27)(2m + 1)2 

3. |iV3(Wi) \t"^[2m+i](J)| > ((V3) - 37)(2m + 1)^ 

4. |/Ci,2 {E,, [y,,,, n y,,,,]) \ tm[3„](y*)| > (^^VlO - 37)(3m)3 
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5. |/Ci,2 {E,^^-, [F,„+,,, n n y,„^,,3]) \ tm[3„](F*)| > (52/10 - 37)(3m)3 

The following two statements are used to prove Lemma[6l Their proofs depend upon calculations 
regarding the distribution T> , and seem to be best put in the framework of "distributions from 
dependent domains processes with blocking". 

Definition 9.3 A reduction layout L = [i^j^u^v^w) is said to be /-switchable if {jn+i,2, ji,2) S 
Nsiii) and K{{un+i,ui},{vn+i,vi,Wn+i,wi}) C Si„+jyj„+i_i r]Vj„^^,, nVj,^^,.,] n Ei^[Vj,^^ nFj,,,]. 
Let cS' denote the set of l-switchable reduction layouts from C 

Lemma 12 ('^Completeness lemma", proof in Section U^) For all 6 > 0, for all m > 31(2/(5)^, 
all partitions (V/, V//) of MVarSm with d{Vi,Vii) > 6, for all n < 210.3.5!^ for all I € [n], 

Lemma 13 ( "Continuity lemma", proof in SectionU^) For every 6 > for every integer d > 1 for 
all m > 450/52, for all partitions (V/, V//) of MVarSm with (5(V/, V//) > 5, for all n < {5'^/60)m, 
for all reduction layouts L,L* £ C with HD{L,L*) < k, fi{L*) > {S"^ /20f'^e-^'^ ■ //(L). 

9.1 The Proof of Lemma [6] 

To prove Lemma [6] we use the following helper lemma. 

Lemma 14 (Proof immediately follows that of Lemma\^) For all 5 > 0, all m > 450/^2, all 
partitions (V/,V//) of MVarSm with S{Vj,Vji) > 6, all n < {5'^/20)m, and all set-disjointness 
instances {X ,Y), there exists an involution / : 5' — > 5' so that for all L G , x y ~ ^f{L) x Y' 
pe{f{L))^pe{L), andfi{f{L))>fi{L){6y20)^^e-^^. 

Proof:(of Lemma [6] from Lemma Let 6 > he given. Set cq = 210.3.5;^ rn Let m > 31(2/5)^ 
and n < Co be given. Let (V/,V//) be a partition of MVarSm with (5(V/,V//) > 5. We take 
C = Cm,n{Vi, Vn) per Definition [821 ^ = ^m,n(V/, V//) per Definition [Ol A : {L, X,Y) ^ A^^^ 
per Definition 18.31 and pe per Definition 18. 4[ 

Condition [1] and Condition [2] follow immediately from Definition 18.21 and Condition [3] follows 
from LemmaHni What remains to be shown is that Condition[4]holds. Let {X, Y) G {0, 1}" x{0, 1}" 
with setdisjn{X ,Y) = 1 be given. Choose / G [n] with Xi = Yi = 1 and set S = SK By Lemma [T^ 
fi{S^) > 6^/2^. Set c = (52/20)i2g-i8 (j^^^ constant of Lemma [H) We now show that for all 
assignments A to MVarsm- 

max/i(pe(L) = e \ A^^^ = A, L e S^) < 1/(1 + c) 

Let A be an assignment to MVarSm and let e G (''^™^) be given. Let B*^ = {L G 5' | Aj^ XY ~ 
A, pe{L) = e}, let 5^ = {L G 5^ \ A^ ^ = A}. Take take as / guaranteed by Lemma [TH 
Because / maps 5' to S\ we have that f{B'^) Q 5', because ^j(j^) ^ y = ^^lxy ~ ^' have 
that f{B^j^ C 5^, and because pe{f{L)) / pe{L) = e, we have that /(^^) ^ 5^ \ B\. Because 
/ is an involution of 5', it is injective, and because fj,{f{L)) > c^{L) for all L, we have that 
/i(5i \ B\) > fi{f{B%)) > ci^i{B\) and therefore ^i{S'^) = ^(5^ \ B\) + ^(^^) > (1 + c)ii{B\). 

Therefore: ^l{pe{L) = e | A^^^^^ = A, L e S^) = fi{B% \ 5^) = < j^. Noting that 

1/(1 + c) = 1 — c/(l + c), we set ci = c/(l + c) and we conclude the proof of Lemma [H ■ 
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Layout L Locally observed assignment Layout i* 

for layouts L and L', when 



Figure 3: With layouts L and L 
assignments A 
whereas the planted edge under L* is {b, s} 



when Xi = Xi = 1, set of vertices and edges specified by the 



^ X Y ^L* X Y ^^'^ equal. Notice however, that the planted edge under L is {a, r} 



Proof: (of Lemma HM Let L = (T,j,u,v,w). We define f{L) = {i,f',u*,v*,w*) below. The basic 
the idea is to modify the reduction layout L by swapping some vertices between the gadgets at 
positions n + 1 and I so that the planted edge changes but the assignment remains the same. 
This is graphically illustrated in Figure [3l Because of the partitioning of the variables, it is not 
immediately the case that L* will be a reduction layout. Among other things, we need to ensure 
that {u'IjWi} £ Eq and {i^+i,i'in+i,2'in+i,3} ^ -^3(^n+i)' which is where we make use of the 
hypothesis that L is /-switchabl^. We give the full definition of L* below, along with the case 
analysis ensuring that the conclusions of the lemma hold. 

ui \ii = n + 1 
Un+l ifi = l 
Ui otherwise 



3k,i 

3 k, 2 
in+1,3 





f in+l 


ifk = l 


= < 




if /c = 71 + 1 




1, ^k 


otherwise 




Jn+1,3 


if fc = ^ 


= < 


31,2 


if k = n + 1 




3k,l 


otherwise 




f jn+1,1 


if A; = / 




1 3k,2 


otherwise 






31,1 



Wn+l \i k = I 
Vk otherwise 

vi if k = n + 1 
Wk otherwise 



We now check each of the properties required by Lemma [141 This is just case analysis and 
rewriting. However, in order to show that f{L) G we make use of the hypothesis that L is 
/-switchable. 



The mapping / is an involution. This is verified by iterating the definition of /. The details are 
carried out in the Appendix, Section [O 

XY = ^f(^L) X Y" '^^is is follows from expanding the definitions and doing a little bookkeeping, 
we put the argument in the Appendix, Section O 

reader carefully checking the case analysis below will note that the definition of i-switchable is a bit stronger 
than we need. See the discussion in Section [131 
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pe{L) ^pe{f{L)). Because L = {i, j,u,v,w) is a reduction layout, {un+i,Wn+i} n {ui,vi} = 0. 
Applying Definition EH we see that pe{L) = {un+i,Wn+i} / {ui,vi} = 'u;*_^i} = 

pe{f{L)). 

^{f{L)) > n{L) ■ ((5^/20)^^e~^^. In order to show this, we need that > (which holds because 
L G jC) and n{f{L)) > (which depends on the fact that f{L) G C, which we show below). For 
now we take the non-zero mass of f{L) as a given. The differences between L and f{L) occur 

only with: ^ in+i^ ^/ ,1; Jn+1,2) in+1,3) 7^ (in+l,l'in+l,2'.?n+l,3 ), {jl,l,jl,2) + 

0T,i'-?T,2)> {ui,vi,wi) / {u^,vl,Wi), and {un+i,Vn+i,Wn+i) / (<+i , <+i , uj^+i ) . Therefore 
HD{L,f{L*)) < 6. We apply Lemma[nito deduce that /u(/(L)) > ^(L) • (5^ /20)^^e-^^ . 

For each L G 5', /(i^) G 5'. First we check that f{L) = {t,j,u,v,w) is indeed a reduction layout. 
We check each property from Definition 18.21 

1. The indices i|, . . . i^+i ^tre distinct: This holds because is a permutation of i. 

2. The indices Ji,i,Ji,2> • • • in,i; Jn,2; Jn+i,i. Jn+i,2> Jn+i,3 are distinct: This holds because f 
is a permutation of 

3. The integers u^, . . . u*^]^, w J, . . . v^_^_l,'wl, . . . u^^+i are distinct: This is true because . . . u^+i, 
vl, . . .v^_^_i, ■Wi, . . . is a permutation of ui, . . . Un+i,vi, . . . Vn+i,wi, . . . Wn+i- 

4. For each /c = 1, . . . n + 1, {ul,vl} G Sj* and {u^, it;^} G E'j* : Because 

we have that {u^wf} = {un+i,Wn+i} G = Ei*, {u*i,Wi} = {un+i,wi} G Ei^^-^ = 

Eq, {<_^_i,<+i} = {ui, Vn+i} G = Ei*^^^, and {<+i,u;f} = G Ei^ = Ei*^_^^. 

For k G we have that {ul,vl} = {uk,Vk} G -Ei^ = E'j* and {-0,^,11;^} = {uk,Wk} = 

Eik £ Ei*- 

5. For each /c = 1, . . . n + 1, {n^, w^, tt;^} C V^-. ^ n Vj* ^: Because 

K{{ui,Un+l}, {vi,Vn+l,Wi,Wn+l}) ^ Ei^^^ H H Fj^+^.a] H [V,, ^ H y^; ^ 

we have that = {un+i:Wn+i,wi} C yj„+i,3 ^ = ^ ^ifa' -^""^ 

same reason, {<+i, <+i} = {uuVn+i,vi} C y^-^^ n V,„+i ^ = V,-.^^_^ n V,-.^^ ^. For 
fc G [n] \ {/}, we have that = {^^fc, Vfc, ix'fc} ^ V^j,,,! n V,-^ ,, = l^';!^! n V,- ,,. 

6. We have that = {u^, f^+i, ^ ^,,1 = ^i*+i,3> because 

7. For each /c G [n + 1], G G: This holds because is a permutation of Tand for each 
/c G [n + 1], ik G G. 

8- 0n+i,i'in+i,2'in+i,3) ^ ^3(«;^+i): Because L is /-switchable, (jn+1,1, iz,i,i«,2) G A^3(«0; 

therefore, (j;+i,i, j;+i,2, in+1,3) = (ii,2, Jn+1,2, J/,i) G A^3(«0 = ^^3(^;+i)- 
9. For each k = 1, . . . n: (j^ 1, 2) ^ ^"^i^D- A; G [n] \ {/}, we have that {jl^i,jl^2) = 

(ifc,ii jfc,2) G ^2{ik) = -^2(^fc)- When k = I, because L is a reduction layout, we have 

that (j„+i,i,i„+i,2,Jn+i,3) G ^^3(^n+i), and therefore (j„+i,3, in+1,1) G iV2(Wi)- Thus: 

Ulvjy = (jn+1,3, in+1,1) G N2iin+i) = N2iq). 
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This establishes that f{L) G C. That f{L) G 5^ fohows immediately from the hypothesis that 
L e 5' and the definitions: (jn+i,2> Ju> jT,2) = (in+1,2, Jn+1,3, Jn+i,i) ^ iV3(i„+i) = Ns{i'^) and 

-f^({^^?>'«n+l}>{f^r><+l)'f«r;'«^n+l}) = K{{ui,Un+l},{vi,Vn+l,Wl,Wn+l}) 

= Ei* [Vr n n v^* 1 n [V^ n 1 



10 Probability Notation and Background 

Definition 10.1 Let Xi, i £ I, be a family of sets indexed by a set I; we write Xj as an abbreviation 
for the product fliei -^i ■ Hie/ ^^'^ Hje j product spaces with ICiJ = 0. For x € Hie/ -^i 
and y G riiej^i write xy to denote the concatenation of x and y (an element of Y\-^j^jj Xi). 
We use the same indices for elements in tuples as we do for the factors of the product, ie. for 
u £ Y\i=j -^i' write u = (uj, . . . Uf), we do not write u = (ui, . . . ut-j+i). Let f be a function 
whose domain is a product space 0^=1 -^'^'^ each j £ [t], for each x £ Y\i=i^i, w;e write f^ to 
denote the curried function with domain Yli=j^iXi, that is, f^{y) = f{xy). 

Definition 10.2 Let r] be a probability distribution over a set X and let f : X ^ M.. We write 
^riif] to denote the expectation of f with respect to r]. At times, the uniform distribution over a set 
will be written as U. Other times, we will write with E C S , we will write Prxes[E] to denote the 
probability that x £ E holds when x is selected uniformly from S. 

Definition 10.3 Let rj be a probability distribution on a product space 

111=1 ^i- Por each I ^ [t], 
let rji be the marginal distribution of rj on Wi^iXi. For each j £ [t] and each x £ 11^=1 
be the probability distribution on 11^=^ ^« given by the formula r]^{y) = ^^^^ 

otherwise. 

Notice that ry^ is the marginal distribution of rj to the coordinates [t] \ [j] conditioned on the 
event that the first j coordinates take the value y. An immediate consequence of the definitions: 

Lemma 15 Let f : UUi X^ ^ M, let L = {!,... i^}: E^,[/] = Y.uex, m{u)^r,4f] 

Unsurprisingly for a technique based on finding structure in a dense family of sets, we beat 
the stuffing out Jensen's Inequality, its relatives, and any averaging arguments that we find in the 
neighborhood. 

Proposition: (Jensen's Inequality) Let / : D ^ M, let (7 : M ^ M be a convex function, and let r/ 
be a probability distribution on D. ¥.^[g o f] > g (E^[/]). 

Lemma 16 (Proof in the Appendix, section\^ ) Let X be a finite set, and letYi, . . .Y^ be a family 
of subsets of X. Seta = ^ X^ILi l^il/I^L ^^^k be a non-negative integer: ^te[n]k I PlfLi ^ 
a^\X\. 
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Lemma 17 (Proof in the Appendix, section\^) There exists a constant c > so that for every 
undirected graph G = {y,E) with \V\ = N and \E\ > We have that: 

Pr^^v4K{{ui},{u2,U3})CG] > a^-{5/N) 

Pr^^v^[K{{ui,U2},{u3,U4,U5,U6}) ^G] > a^-{23/N) 

Proposition: Let r/ be a probability measure on a space X, and let / : X ^ [0, 1] be measurable. 
For all e G [0, 1] and all c> r]{{x \ f{x) > iE^[/]}) > (1 - l/c)E^[/]. 

11 Distributions from DDWB Processes 

To prove the completeness lemma (Lemma [T2]l and the continuity lemma (Lemma [T3]l , we make 
some detailed calculations about the distribution T>. It seems that by moving to slightly more 
general framework, some of the calculations and case analyses are simplified. In Lemma [20] in 
Section [T^] we show that the distribution T) falls into this framework and use the machinery of 
DDWB processes developed in this section to finish the proofs of Lemma [T2] and Lemma [T3l 

Definition 11.1 Let t he an integer, Xi,...Xt he sets, and let Si : nfc=i "'^fc ~^ ^(^j); 0''nd 
Fi : ni—i -^k ^i^i) families of maps with i £ [t]. Assume that for all i = 1, . . .t, and all 
(ni,...Ui_i) G lll~JiXi, Si{ui, . . . Ui-.i)\F{ui, . . . Ui-i) / and Si{ui, . . . Ui^i)\Fi{ui, . . . Ui^i) / 
0. 

The distribution given by the dependent domains with blocking process of S and F is the 
distribution 7r(= T^g p) on Y[i=i -^i given by the random process that generates a sequence (ui, . . . ut) 
as follows: For i = 1, . . .t, choose Ui uniformly from Si{ui, . . . Ui-i) \ Fi{ui, . . . Ui-i). The blockage 
bound of a DDWB process S, F is the smallest /? > so that for alii = 1, . . . t and all u G ni—i -^k> 
|-^i(^^)| ^ P\Si{u)\. The covering bound for S, F is the largest k G [0, 1] so that for all i = 1, . . . t 
and all u G Hl—i -^k, \Si{u) \ Fi{u)\ > K\Xi\. 

The following easy fact is the crux of an induction argument. 

Proposition: Let vr be the distribution on 0^=1 -^i given by the DDWB process S, F. For each 
a £ Xi, The distribution Tr^ is generated by the DDWB process on 11^=2 -^i given by Sf, • • • , 
F2 , . . . Ff. If the process S, F has a blockage bound < /?, then the process 5", F"- has a blockage 
bound < p. 

11.1 Loss of Expectation Lemma for DDWB Distributions 

The following lemma is used to pass density results for the uniform distribution, such as Lemma [T71 
to certain DDWB distributions. This is how Lemma [12] will be proved. It is a simple but careful 
combination of two observations: If the domains Si contain the support of a [0, 1] valued function, 
then uniformly selecting over the S'j's (instead of all of Xi) will only increase the expectation. Of 
course the blocking of the Fj's could reduce the expectation, but for a DDWB with blockage bound 
P, each coordinate that the event depends upon can reduce the expectation by at most (3. 

Lemma 18 Let Y\'i=i -^i be a product space, and let f : ^^=1 -^^ ~^ [^i 1] ^6 ^ function that depends 
upon at most k coordinates, ii, . . .i^. Let U be the uniform distribution on nj=i -^i' '^'^^ ^ 
a DDWB distribution on Y\l=i -^i given by some S and F. If the following two conditions are 
satisfied: 
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1. The DDWB process S, F has blockage hound < (3. 

2. For all a G HLi /(^) > ^ ^^^^ /'^'^ i = 1, • • • ^; S € 'S'ij (ai, • • • 
r/ten E,[/] >Er/[/] 

Proof: We prove the claim by induction on k. The lemma clearly holds for A; = 0, as in that case 
/ is constant over l\l=iXi, and therefore E^[f] = Eu[f]. We now assume that the lemma holds 
for functions that depend on only k coordinates, and demonstrate that it holds for functions that 
depend on only k + 1 coordinates. 

Let t, Yli^i Xi, TT, S, F, and be given as in the statement of the lemma- with / dependent only 
upon k + 1 coordinates, h, ■ ■ ■ ik+i- Let i = ii he the first coordinate upon which the function / 
depends. Set I = [i — 1] and J = [t] \ [i]. Let Xj = Hfee/ -^k and Xj = HfeeJ -^k- 

We reduce to the induction hypothesis by showing that for each u G Xj, a ^ Xi, the conditions of 
the induction hypothesis are met for the function with process S""", F"", and distribution vr"'^. 
Observe that the distribution tt"" is given by the DDWB process 5^1, . . . S*"" and F'^^^, . . . F"", a 
process with blockage bound < (3 because S, F has blockage bound < (3. Moreover, the function 
■ nj=j-i-i -^i ~^ 1] depends on at most k coordinates. By specializing the hypothesis "for 
all a, if /(a) > then for all j = 1, . . .k, ai^ G Si^{ai, . . . ai.-i)" to inputs with prefix ua and 
weakening its conclusion to cover only j = 2, ... A;, we have that "for all b G Xj so that f{uab) > 0, 
for all j = 2,...k, hi- G (-u, a, . . . . This is equivalent to "for all b G Xj so that 

/"^""(b) > 0, for all j = 2, . . . k, bi. G £'"'^(6^+1, . . . Therefore by the induction hypothesis we 

have that E^a„[/"»] > E^ualf"] - kl3. 

Furthermore, from the hypothesis "for all u G 11!=! /(^) > then \/j G [A; + 1], Ui- G 

S'.j^. (ui, . . . Uj^._i)" we conclude that for all v G n}=i-^j with Ejyjr[/^] > 0, G Si{yi, . . .Vi-i). 
Therefore, for all u = {ui, . . . itj-i) G Xi 

We now bound the expectation of / with respect to tt from below: 

= E ^^(") E E ^"(«^)/(««&) 

E-TT /'^r^ XSi{u)\Fi(u) («) _uafj^^ ff;rt„t\ 

= E E ^^MffH E 
= E E '")r-';'!> ,..[r°] 

«ex, aes.w l^*^""^ ^^'^""^1 

^ E-(»-) E ^^K.4ri-*/^) 
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ueXi 

= -{k + l)P+J2 M^)^u[f] = -{k + l)P + Eu[f] 
ueXi 

The penultimate equality holds because the function / is independent of the coordinates of /, and 
therefore, for all ueXi, Euu[r] = Ec/[/]. 



11.2 "Continuity" for DDWB Processes 

Lemma 19 Let tt be a distribution on the product space Yll=i given by a DDWB process S, F 
with covering bound k. Let c and d be arbitrary. Let Lq C [t] so that |/o| = d. Let il,v G 11^=1 -^i 
be arbitrary. If for all i = 1, . . .t, 

1. Txiu) > and Tr{v) > 

2. For alii G [t] \ Jo, Si{ui, . . .Ui-i) = Si{vi, . . .Vi-i) 

3. For all ie[t]\ Iq, \Fi{ui, . . Fi{vi, . ..Vi^i)\ < {c/t)\Xi\ 
then ir{v) < K~'^e'^l'^'K{u). 

Proof: Explicit calculation reveals that: 

7r(u) _ 11^=1 (y\Si{ui,...Ui-i)\Fi{ui,...Ui-'i)\) _ -A- \Si{vi,...Vi-i)\Fi{vi,...Vi_i)\ 

<v) ~ rr*if^7 \ ~ \\\Si{uu...,Ui-i)\Fi{ui,...Ui-i)\ 

i.i.i=l \ \Si{v^_,...Vi^^)\F^{vx,...Vi^i)\) '--^ 

\Si{vi, . . . Vi-i) \ Fi{vi, . . . Vi-i)\ Yj \Si{vi, . . . Vi-i) \ Fi{vi, . . . Vi-i)\ 



n \Oiyvi, . . . Vj-l) \ rjyui, . . . Vi-i)\ T-r 
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< 



n \Xi\ yr \Si{vi,...Vi-.l)\Fi{vi,...Vi^l] 
k\YA ii 



-d 



n 



\Si{vi, . . . Vj^i) \Fi{vi,... Vi_i)\ 
\Si{ui, \ Fi{ui,... 



K 



-d 



n 



\Si{ui, . . . Uj-i) \ Fi{vi, . . . Vi-i)\ 
\Si{ui, . . . \ Fi{ui, . . . 



d TT \Si{ui, ■ ■ ■ Ui-i) \ Fi{ui, . . . Ui-i)\ + \Fi{vi, . . . Vi-i) ® Fi{ui, . . . Ui-i) 



< n 

i6[i]\/o 

^d 



< K 



\Si{ui, . . .,Ui-i) \ Fi{ui, . . . Ui-i] 



i&[t]\Io 



12 The Distribution V is a DDWB Distribution 

We give a DDWB process S, F and show that it produces the distribution D used to generate 
reduction layouts used in the reduction from set-disjointness to the FindBadEdge search lemma. 
This enables us to use the machinery of DDWB distributions to prove Lemma [12] and Lemma [T3l 

Definition 12.1 Let (V/,V//) be a partition of MVarSm- Let G, N'i{-), N2{-) he as in Defi- 
nition [Kli We define a DDWB process S, F over the product space [m]"^^^ x ([2m + 1]^) x 
([2m + l]3) X ([3m]3)"+^ as follows: 

1. When choosing ik given ii, . . . ifc-i-' = [m], Sk = G and Fk{ii, ■ ■ ■ ik~i) = {^i, • • • ik-i}- 

2. When choosing {jk,i,jk,2) giveni, (ji,i, ji,2), • • • (jfc-i,i, ifc-1,2) (with k < n), we have Xn+i+k = 
[2m + Sn+i+k{h (ji,i, ji,2), • • • (jfc-i,i,ifc-i,2)) = N2{ik), and: 

F„+i+fe(r, (ji_i,ji,2),... (jfe-i,i,ife-i,2)) = pm[2m+i] ji,2, ■ ■ ■ jk-1,1, jk-1,2}) 

3. When choosing (j„+i,i, i„+i,2, jn+1,3) giveni, (ii,i,ii,2), • • • ijn,i,jn,2), we have X2n+2 = [2m+ 

1]^, 5'2„+2 {h jl,2), • • • {jn,l,jn,2)) = iV3(Wl); and: 

F2n+2(l, (jl,l, jl,2), • • • ijn,l,jn,2)) = jl,2, • ■■jn,l,jn,2} 

4. Fork < n, when choosing {uk,Vk,Wk) giveni, j, {ui,vi,wi), . . . {uk-i,Vk-i,Wk-i), ^2n+2+fc = 

[3mf, S2n+2+k{hJ,{ui,Vl,Wi), . . . {uk-l,Vk^l,Wk-l)) =/Ci,2 {Ei^iVj^-^ ^^jkj)' ^'f^<^ 
F2n+2+kih J, iui,Vl,Wi), . . . {uk-l,Vk-l,Wk-l)) = tm[3rn]{{ui,Vi,Wi, . . . Uk^l, Vk-l, Wk-l}) 

5. When choosing {un+i,Vn+i,Wn+i) given i,j,{ui,vi,wi), ... {un,Vn,Wn), ^3n+3 = [3mf, 
Ssn+sihJ, {ui,vi,wi), . . . {uk-i,Vk-i,Wk-i)) = /Ci,2 [^in+i,i n V^i„+i,2 n V,„+i,3]), and 

Lemma 20 Let m > 450/5^. Let {Vi,Vii) be a partition of MVarSm- Let 5 = 5{Vi,Vn) and let 
7 = The distribution P(V/,V//) is generated by the DDWB process S, F over the product 

space [m]"+^ x ([2m + ifY X ([2m + x {[ZmfY^ . Moreover, this process has blockage hound 
< 30^/6^ and it has covering hound > min{(5^/10 — 87, 6/3 — 87, 5/12 — 7}. 
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Proof: That the DDWB process S, F generates the distribution D follows immediately by com- 
paring the above functions with the experiment of Definition 18.21 The covering bounds follow 
immediately from Lemma \TT\ and the blockage bounds are implicit in those calculations. ■ 

Corollary 21 7/7 = 5^/60, then the covering bound of the process is > (5^/20, ie. k > 5^/20. 

Now we use Lemma [19] to prove the continuity lemma: 

Proof: (of the continuity lemma, Lemma [T3|) Let L = [i^j^u^v^w) and L* = (T' jf' ,u* ,v* ,w*) be 
two reduction layouts from with HD(L, L*) < d. Let S and F be the DDWB process for 
generating the distribution as described in Definition 1 12.11 For the sake of brevity, in the scope 
of this proof we will write Si{L) and Si{L*) instead of with their proper arguments, eg. S2n+2+k{I-') 
instead of S'2n+2+fc J*, {ui-,vi,wi), . . . {uk-~i,Vk-i,Wk-i))- We do the same with the -Fj's. We set 
Jo to be the set of indices i so that Si{L) ^ Si{L*). Checking against the definitions of S, F, 
it is easily checked by a case-analysis that |Io| < 2d. We place this argument in the Appendix, 
Section [HI as Lemma [22l 

We now check that the hypotheses of Lemma [19] are met with the process S, F over [m]""*"^ x 
([2m + 1]^)" X ([2m + 1]^) x ([3m]^)"^ , with t = 3n + 3, with vr = /i, with Iq as above, and with 
u = L*, V = L By Lemma [20] and Corollarv 1211 the DDWB process generating /i has k > 5^/20 
where 6 = d{Vi,Vn) and j = ^ < 6'^/60. 

Property [TJ /u(L) > and fJ-{L*) > 0. This is satisfied because L £ C, and L* £ C 
Property [2} The set Iq is defined to be the set of i with Si{L) / Si{L*). 

Property [S In the Appendix, Section[Dl we show that for all i £ [t], \Fi{L)'SFi{L*)\ < {9d'y/{3n+ 
3))|^.|. 

By Lemma [19] 

/x(L) < K-2'^e9'^^/XL*) < (<5V20)-2'^e9^(^'/60)/(^'/20)^(^) ^ {20/5^f''e^''fi{L) = {20/6^f''e^''f,{L) 

m 

Now we use Lemma [TS] to prove the completeness lemma: 

Proof:(of the completeness lemma. Lemma [T2|) Fix m, and let (V/, V//) be a partition of MVarSm 
so with 6 = 6{Vi,Vn)- Let n be given so that n < 6^^ / {2^^ ■ 3 • 5^)m. Let / S [n] be given. Let U be 
uniform distribution on [m]"'^^ x ([2m + 1]2)" X ([2m+l]3) x ([3m]3)"+\ Let /i be the mass function 
for the distribution V. Set (3 to be the blockage bound for the DDWB process generating V. Let 
A C [m]"+i X ([2m + l]^)" x ([2m + l]^) x ([3m]3)"+^ be the event that (jn+2,1, Jm, Jz,2) G N^{ii). 

(jn+l,l,Jn+l,2, jn+1,3) G N^{in+l), and K {{Un+l,Ul} , {Vn+li Vl, Wn+l,Wl}) C [V,-, ^ H Vj', ^ H 

n Vj„+i_2 ^ yjn+i,z\- Notice that 5' = £ n and that because /i(/2) = 1, /^(5') = 
/i(£n^) = ii{A). 

Let / denote the indices 1, . . . 2n + 2 (so that, using our abused notation, the coordinates of I 
correspond to i,fj. Let A C [m]"+-'^ x ([2m + 1]^)" x ([2m + 1]'^) be the event that ii,in+i G G, 
(jn+i,i,Jn+i,2, jn+1,3) G N3{in+i), and (jri+1,2, j;,i, j«,2) G ^^sik)- Notice that ADAi and therefore 

For each setting of Tand j, the event A{i,j) depends only on the values of {un+i,Vn+i,Wn+i), 
and {ui,vi,wi). Moreover, in the event that A holds, we have that {ui,vi,wi) E ICi^2{Eii [V^-j jCiVji j]) 
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and {un+i,Vn+i,Wn+i) G }Ci,2{Ei^+i[Vj^^i -, n V^„+i_2 ^ ^jn+ij)- Therefore we can apply LemmalH] 
and conclude for all i, j: ^/'^{A{i,j)) > W'^{Aii,j)) - 2(3. 

For each Tand j set D{i,j) = \Ei^ [Vj,^^ n Vj, ,] nEi^^, [Vj,^^,^, n Vj„^,^, n V,„^,^,] Notice 
that 6{Vi, Vii) is the expectation of D over the uniform distribution on [m]^ x [2m+l]^. Because the 
marginal distribution of on {ui,vi,wi) and {un+i,Vn+i,Wn+i) is just the uniform distribution 
[3m]^ X [3m]^, we can apply Lemma [T71 For each choice of i,j we have that W'^ {A{i,j)) > D(i,j)^ — 
(23/3m). Therefore: 



^,{A) = j)) > E [u''HA{r,j)) - 2(3) > -2/3 + E ,ii{i:,j)u'M{hj)) 

> -2/3+ E /x/(r,J)p(r,j)8-(23/3m)) > -2/3 - (23/3m) + J^/x,(r, J)p(r, J) • xa(?, j)f 

> -2(3 - (23/3m) + (^Mhj)D{r,j)xAihj)^ = -2(3 - (23/3m) + {¥.^^[D ■ xa]? 

The final task is to get a lower bound for E^^[L> • xa]- This will follow from an application of 
Lemmaim Let U denote the uniform distribution over i,]", In the Appendix, Section[Dl Lemma [23} 
it is shown that: Ki/[D ■ xa] > S{Vi, Vii)/2. Notice that the function D ■ xa depends only upon 4 
coordinates: ii, in+i, the triple (jn+iA, jn+i,2, jn+1,3) and the pair (ji,i,ii,2)- Moreover, whenever 
D-XA > 0, we have that U e G, in+i e G, (i„+i,i,i„+i,2, jn+1,3) S Nsiin+i), and j;,2) G ^2(11), 
so we may apply Lemma [18] to conclude that IE^^[(^ • xa] > '^/2 — 4/3. Therefore: 

/i(^) > -2/3 - (23/3m) + (E^,[L> • xa])^ > -2/3 - (23/3m) + (6/2 - 4/3)^ 

Because m > 31(2/5)8, ^ < 0.25(5/2)*. By Lemma[20l /3 < 307/(5^ < 30((5i°/(2i° • 3 • S^))/^^ = 
(5^/(29 • 5), therefore: 



29-5 ■ V2/ V2 29-5 
> -0.2(^V-0.25f?V+f^|l 



2j ' \2j \2\ 26-5 
> -0.45 1 n% 0.97(^1 



13 Debriefing 

After digesting the proof of Theorem [HI the reader might notice that there was some overkill in a few 
of the arguments, and wonder if a tighter argument could improve the constants of Theorem[8l This 
seems likely, however, it was decided that optimizing between different values of "astronomical" 
was not worth the added length. 

There are two points in the argument particularly worthy of mention. The first is that Defi- 
nition 19.31 is bit stronger than is needed to prove Lemma [T4t and it may be possible with a more 
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careful definition to reduce the exponent of 8 (which comes from trying to randomly find a ^2,4 in a 
graph of edge density a) to something smaller, like 4 or 6. This would clearly improve the bound in 
Lemma [T2j Furthermore, it might also allow a slackening of the definition of partition density. Def- 
inition [521 so that a larger value is guaranteed by an analog to LemmaO Furthermore, the DDWB 
machinery introduces a fair a amount of slop because the blockage bounds (coverage bounds) are 
taken as a maximum (minimum) over all coordinates, whereas a more careful coordinate- wise anal- 
ysis of the particular transformation of Lemma O would improve the constants seen in Lemma [12] 
and Lemma [T3l Of course, this would likely be a more lengthy analysis. 
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A Proofs and Calculations for Section [2] 

Proof: (of Lemma [T6|) A standard application of the convexity of the function x ^ x^. For each 
X € X, let = \{i £ [n] \ x £ Yi}\. Set = jyi^xgx We have that dx = j^l^xex^x = 
W\ Z]r=i l^il = ce^j ^'^d therefore by Jensen's Inequality: 



1 



k 



1 



n' 



n' 



-kT.d^^ik\X\ {d-x)'>^jX\iar.)' 

x&X 




Proof: (of Lemma [T7|) 
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1. Conditioned on the choice of ui, the probabihty that {ui,W2} £ E and {mi,U3} G E is 

(-jf-^ ■ Because Ylu = ;^2a(^) = a{N — 1), convexity shows that the probability that 

{tii,U2} G E and {ui,n3} G is at least N'^ ■ N{a{N - l)f = a^{l - 2/N + l/N^). We 
now subtract out the probability that ui, 1*2,^3 are not all distinct, which is clearly no more 
than and we obtain the stated bound. 

2. For each ui and U2, let D{ui,U2) be the number of common neighbors of ui and U2- Because 
the average degree of n S is a{N — 1), Lemma [T6l shows that ■^'^^^y2 D{ui,U2) > 
a'^{{N-l)/N)'^{N-l) > a'^{l-2/N). Conditioned on the choice of mi , U2 , the probability that 
all edges are present is clearly {D{ui,U2)/N)'^. Apply Jensen's Inequality and we have that the 
probability that all edges are present is at least (0^(1 - 2/A^))'^ = a^(l-2/iV)'^ > a*(l-8/A^). 
We now subtract out the probability that ui,U2,U3,U4,U5,uq are not all distinct, which is 
clearly no more than (2)/-^ = and we obtain the stated bound. 



B Proofs and Calculations for Section [8] 

Proof:(of Lemma[9]) Let 6 = 5(V/, V//). Notice that when m > 36 > ((6/5) — l)/2, we have that 
3/(2m + 1) < 5/2. By Definition [52] , we have that 

„.piTTF^ ^ in(E..[v;.lnE„[i/,J)| = *f™) 

^ ^ ie[m]2 je[2m+l]^ k=l ^ ^ 



And therefore ^(am+i)^ SieH Sj'e[2m+i]3 ^ ^ ^isJI > K T)- Because the number of 

terms with ji = j2, j2 = js or ji = is at most 3m(2m + 1)^, such terms can contribute at most 
m(2m+l)j 3"'(2"' + l)^('r) = lisr (T) '° this sum, SO we have; 

E E n V,, n v,,]l > (S - 3/(2™ + D) (^^"') > (S/2) (^'^") 

2 distinct 

Combining this with the fact that for each i £ [m], \Ei\ < (^™), by averaging, we have that 
with probability at least 6/6 over the choice of iji,j2,j3, with ji,j2,j3 all distinct, that |-E'i[T^i n 
Vj2r]Vj.J\ > ((5/3) (^^). Therefore, with probability at least 6/12 over choices of i, there are at least 
(Vl2)[2m + 1]^ many triples j'l, j2,j3 that are distinct and have \Ei[Vj^ D Vj^ D Vj^]\ > {6/3){^^). 
Therefore, |G| > {6/12)m. 

m 

Proof: (of Lemma [TOj) Let L = (z, j,u,v,w) be a reduction layout, and let Xi, . . . Xn,Yi, . . .Yn 
be a set intersection instance. Let e be a bad edge of A^-^^. We recall two useful defini- 
tions for the proof of this lemma: From Definition 18.21 the planted edge under X,Y,L is de- 
fined as pe(^X,Y,Lj = {un+i,Wn+i}- From Definition 18.31 the assignment A^^^^ is defined 
as follows: We set / = {h, . . . in+i}, set J = ji,2, ■ ■ ■ jn,i, jn,2, jn+1,1, jn+1,2, jn+1,3}, and set 
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V = {til, • • • Un+1, vi, . . . Vn+1, wi, . . . Wn+i}- We Set f3, (3 (L) to be the lexicographically first as- 
signment to the variables {x* | i G [m] — /, e G [3m — V]"^} U {yi \ j G [2m + 1] — J, u G [3m] — V}, 
so that P defines a matching of size m — n — 1 and an independent set of size 2(m — n — 1). 

if i G [m] — / and e G ([3m] — V)'^ 
i = ik and e = {uk,Vk} for some k G [n] 
i = ik and e = {tt^., ti;^.} for some k G [n] 
if i = in+i and e = {u„+i,Wn+i} 
otherwise 

if J G [2m + 1] — j and u G [3m] — F 
if j = jk,i and x = for some A; G [n] 
if 3 = Jfc,2 and x = Uk for some A; G [n] 
if J = jk,2 and x = for some /c G [n] 
if j = in+1,1 and x = Un+i 
if j = in+1,2 and x = Vn+i 
if J = in+1,3 and x = Wn+i 
otherwise 

Let e be a bad edge for the assignment ^^xf Fii^st of all, because /3 sets no bad edges, 
e n y 7^ 0. Furthermore, for all e with |e n y| = 1 have ^ ^{x\) = for all i, so e C V. 
Finally, for e C y, with some A^^^(Xg) = 1, we have that for some k G [n], e = {u^^Vk} or 
e = {ttfc, Choose k so that e = {n^, Ufc} or e = {u^, iffc}. If /c = n + 1 then we must have that 
e = {«n+i, li'n+i}, and e the bad edge, so consider the case when k <n. 

Notice that for all i' ^ ik, ^ y (^^e ) ~ 0- other hand, e is a bad edge, so there is some 

x\ that gets set to 1, therefore ^^^^(x**^) = 1. 

We now rule out the case that e = {n^, u)fc}. Because ^ y^^^e^ = 1, we have by construction 
that Xk = 0. Because e is bad, for some A^ ^ = 1 and Aj^ ^ YiVwk) — 1- However, yt,^ 

i' 

and yii^ cannot both be set to 1. 

Suppose that e = {uk.,Vk}- Because A^ y(2^e ) = 1) have by construction that Xi = 1. If 
{Xi,Yi) = (1, 1), then the lemma holds. Otherwise, Yi = 0. But in this case, we have that for all j, 
A^ X Y^y^i^ ~ ^' contradiction to e being a bad edge. 



C Proofs and Calculations for Section [9] 

Proof:(of Lemma [n]) For each /c = 1, . . . n, as we choose {jk,i,jk,2) (and (jn+i,i, jn+1,2, jn.+i.s)), 
I J| < 2n < 2{n + 1) = 27m and as we choose each {uk,Vk,Wk), \V*\ < 3n < 3(n + 1) = 37m. 

1. By Lemma [9l \G\ > (5/12)m. On the other hand, . . . ifc-i}] < n < 'jm. Therefore, 

> {{6/12) -j)m. 

2. Because |J| < 2n, we have that pmpm+i] (J) < 2n(2m + 1) + (2m + l)2n < 2(27m)(2m + l) = 
2(27m)(2m + 1) = 27(2m)(2m + 1) < 27(2m + 1)^. Combining this with the fact that ifc G G 



A 



L,XY 



(4) 



~^Xk 

1 





if 
if 



I3{y'x) 
1 

Yk 
^Yk 

1 

1 

1 
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and therefore \N2{ik)\ > |-^3(^fc)| > {5/3){2m + 1)^ we have that \N2{ik) \ piT^[2m+i]{J)\ 
((<5/3)-27)(2m + l)2. 



> 



3. Because |J| < 2n we have that tm[2m+i]{J) < 3(2n)(2m + 1)^ < 3(27m)(2m + l)^ = 
37(2m)(2m + l)^ < 37(2m + l)^. Because ip G G, |iV3(ip)| > {6/3){2m + l)^. Therefore: 
miip) \ tm[2™+i](J)| > {{6/3) - 37)(2m + 1)3. 

4. Because \V*\ < 3n, \tm{V*)\ < 3(3n)(3m)2 < 3(37m)(3m)2 = 37(3m)3. We now get a 
lower bound on the size of /Ci^2 {l^iki^jk i ^ ^jk2\)- First, because {jk,i,jk,2) S N2{ik), there 
exists some / with \Ei[Vj> D Vj^^^ n V,-^ J] > ((5/3)(3^), so we have that \Ei^ [Fj^^ n V,-^ J > 
((5/3) (3^^. Feeding this lowerbound on the edge density into Lemma [T71 we have that: 

\lCi,2{E.,, [F,,, n F,,,])| > (5V9 - (5/m)) • {3mf 

Combining the upper bound on |tm(y*)| and with the preceding lower bound: 

\^l,2{E^, n Vj,^,])\ tm{V*)\ < {{6^/9) - (5/m) - 37) {3mf 

Because m > 450/(5^, we have that 5/m < (5^/90 and therefore the above quantity is > 
{6^/9 - 6^/90 - 37)(3m)3 = {5^/10) - 3j){3mf. 

5. This derivation is identical to the previous, except that it uses the lower bound of |-Ej[^p.3 
Vjp,! n Vj^J\ > (V3)(3^) that holds because (jp,i, jp,2, jp.s) G N3{ip). 



Proof: (details for Lemma [T] 

The proof that / is an involution. Let L = {i, j,u,v,w) be a reduction layout, and let 
{^,f,it*,v*,w*) = f{L), and let {f'* , f* ,u** ,v** ,w**) = f{f{L)). Applying the definitions shows 
that: 



n JkA 



Jk,2 



Jn+1,3 



+1 



•?n+l,3 ~ 

•if. 

•?/,2 ~ 3n+l 
jk,l = jk,l 
in+1,1 = 31,2 
3k,2 = 3k,2 



iik = l 

if A: = n + 1 
otherwise 
\ik = l 
if = n + 1 
otherwise 
iik = l 
otherwise 



3i 1 — Jn+1,3 



Ui. 



wl 



u. 



n+l 



= Ul 
Uk 



w. 



n+l 



VI 



Vk 



if A; = n + 1 
ifk = l 
otherwise 

if A; = / 
otherwise 



= Wn+i if z = n + 1 



wl 



Wk 



otherwise 



The proof that A 



A 



L,X,Y 



We expand the definitions of A 



L,X,Y 



and A 



definitionESNotice that {h,... = {il, . . . {ii,i,ii,2, • ■■ jn,i, jn,2, jn+1,1, jn+1,2, jn+1,3, } = 

{ir,l'-?'l,2> • • •in,l>in,2' in+l,l'in+l,2' in+1,3' }, and {ui, . . . U„+l,Ui, . . . Vn+l,Wi,. . . Wn+l} = {ul, . 

v^, . . . v^_^_l,wl, . . . Let /, J, and V respectively denote these three sets. Because P{L) and 

f3{L*) are both the lexicographically first assignment to the variables 

{xi\ie [m] - I, e e [3m - Vf} U {yi | j E [2m + 1] - J, u G [3m] - V} 



n+l 5 
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so that P defines a matching of size m — n — l and an independent set of size 2(m — n — 1), we 
that f3{L) = (3{L*). Write /? for this assignment. We compare ^ y and ^j^^) x Y directly: 



if i G H - / and e G ([3m] - Vf 

Xk if i = ifc and e = {uk, Vk} for some k G [n] \ {1} 

-iXjfc if i = Zfe and e = w^} for some fc G [n] \ {/} 
1(= Xi) if i = and e = {n^, t>i} 

0(= ^Xi) if f = ii and e = {ti/, it;;} 

1 iii = in+i and e = w„+i} 

otherwise 



f{L),X,Y 



(5{xi) if i G [m] - 7 and e G ([3m] - Vf 

Xk ii i = ik and e = {uk, Vk} for some A; G [n] \ {1} 

-iXfc if i = ifc and e = {uk,Wk} for some /c G [n] \ {1} 

1 if i = i;(= i;^^) and e = {wi,Wi}(= {<+i,<+i}) 
if i = i;(= and e = {ui,wi}{= {<+i,w,*}) 

1(= X;) if i = in+i{= i^) and e = {n^+i, iw„+i}(= {u^, }) 
otherwise 



ivi) 



P{yi) if 3 G [2m + 1] - J and u G [3m] - V 

1 if i = jk,i and x = Vk for some /c G [n] 

Yfe if j = jk^2 and x = for some A; G [n] \ {/} 

-life if J = ifc,2 and x = Wk for some A; G [n] \ {/} 
l(=Y'z) if j = and a; = 

0(= -il/) if j = and x = wi 

1 if j = in+1,1 and x = Un+i 

1 if j = Jn+1,2 and a; = ■Un+i 

1 if i = in+1,3 and a; = Wn+i 

otherwise 



}{L),X,Y 



ivi) 



1 

1 



1(= 

1 
1 





if j G [2m + 1] - J and it G [3mJ - F 
if j = j^. 1 and x = for some A; G [n] 
if j = jk,2 and x = Uk for some A; G [n-] \ {/} 
if J = jk,2 and x = Wk for some A; G [n] \ {Z} 
if j = ji,2i= in+i,i) and x = «/(= <+i) 
if j = jl,2{= in+1,1 and x = wi{= Wj) 
if j = jn+i^i{= and x = Un+l = u* 
if J = Jn+i,2(= Jn+1,2) and x = Vn+i = <+i 
if j = in+i,3(= i;*i) and X = -Wn+l = 



otherwise 
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D Proofs and Calculations for Section 
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Lemma 22 If L and L* are reduction layouts with HD{L, L*) < d, then there are at most 2d 
positions i with Si{L) ^ Si{L*). 

Proof: Let L = {i, j,il,v,w) and let L* = (r, j*, iT, iT, tjT). We consider each position where L 
and L* might differ and see how each affects the functions S given in Definition 112.11 

1. If ik / il, with k <n, then we might have that Sn+i+kiL) = A^2(«fc) / ^2{il) = Sn+i+k{L*), 
or that S2n+2+k{L) = /Ci,2(i?ijF,,,, n V,,J) + /Ci,2(i?,. [l^,*^^ n V,* J)52„+2+fc(i*). 

2. If / then we might have that S'2n+2(-^-) = A'^3(in+i) / iVsl^^^+i) = 'S'2„+2(i*), or 

that 53„+3(i) = /Ci,2(i?.„+,[vSn+Mnvs„+,..ny,„^,,3]) / /Ci,2(i?.;^JF,;.^,,,nF,;;^^^^ny,*^^3]) = 

3. If, for some A; < n, {jk,i,jk,2) / (j^ i, j'fc 2) ^hen we might have that 52„,+2+fe(-^^) = ^1,2(^1 jV'j^ iH 

4. If (jn+i,i,Jn+i,2,Jn+i,3) / ij^+1,2' ^n+i.s) ^^en we might have that 

53„+3(L) = /Ci,2(i?.„+,[F,„+,,,nF,„^,,,ny,„^,,3]) / /Ci,2(i?.;^JF,;^,,,ny,;;^^^^nv,-;^^j) = 53„+3(l*) 

5. Differences between {uk,Vk,Wk) and do not affect any of the 5i's. 



Proof: (The calculations ensuring Property [3l of Lemma [19] as applied in the proof of Lemma [13] .) 
1. Coordinates 1, . . . n + 1: Fk{ii, ■ ■ ■ ik-i) = {^1, • • • ik-i} and = [m], therefore: 

\Fk{L) ® Fk{L*)\ = |{ii,...4-i}eK,...4_i}| <d 

d 3n + 3 d 3jm 3^7 

m = m = |Afc| 



3n + 3 m 3n + 3 m 3n + 3 

2. For coordinates n + 2, . . . 2n + 1, Xn+i^^ = + 1]^ and 

Fn+l+k{h{h,l,jl,2),---{3k~l,l,jk~l,2)) = Pm[2ni+1] ({jl,l , Jl,2 , • • • , ife-1,2}) 

-?^n+l+fc(^; ^1,2); ■ ■ ■ (ifc-l,l'ifc-l,2)) = P^[2m+1] ({jl,l ' il,2 ' • • • ifc-1,1 ; ifc-1,2}) 

Notice that for any X,Y, pm[2rn+i]i^) © ^^-[2™+!] (^) ^ P^'-pm+i] (-^ © the other 

hand, HD{L,L*) < d, so |{ji,i, ji,2, • • • jfc-i,i, ifc-1,2} © {j^i' ii,2' • • • ifc-i,i'-?l-i,2}l ^ ^d, and 
therefore 

|F„+i+fc(L)©F„+i+,(L*)| < 2.2d-(2m + l) = ^^-^^^-^(3n + 3)(2m + l)2 

4(i 37m 4d 37m I I 6^7 I I 



3n + 32m + l' 3n + 3 2m ' ' 3n + 3' 
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3. At coordinate 2n + 2, X2n+2 = [2^1 + 1]^ and 

F2n+2{^, (jl,liil,2), • • • (jn,l,jn,2)) = i^T-pm+l] jl,2, • • • in,l, Jn,2}) 

i^2n+2(«*, (jtl'il,2)> • • • (jn,l,Jn,2)) = trn^2m+l] (Ol,!, Jl,2' • • • in,l> in,2}) 

Notice that for any X, Y , tmpm+i] {X)®'tfn[2m+i] (X) ^ ^^^pm+i] (^® 5^)- On the other hand, 
HD{L,L*)<d, so 

,15^1,2) • • • Jn,l 5 Jn, 2} © {ji,i)Ji,2) • • •in,iiin,2}l — ^d, and therefore 

6d 

\F2n+2{L) ® F2n+2{L*)\ < 3 ■ 2d ■ {2m + if = j^^-—j^^—^^{3n + :i){2m + if 

6d 3n + 3 , i\3 6(i 37m 3 



(2m + 1)'^ = ■ (2m + l)" 



3n + 3 2m + 1 ' ' 3n + 3 2m + 1 

_6^37m 

3n + 3 2m ^ ^ 3n + 3 ^ ^ 

4. For coordinates 2n + 3, . . . 3n + 3, X2n+2+k = [3m]^ and 

F2n+2+k{hJ, {ui,Vi,Wi), . . . {Uk-l,Vk-l,Wk-l)) = tm{{ui, Vi, Wi , . . .Uk-l,Vk^i,Wk-l}) 

Notice that for any finite sets X and Y: tm[3m](X) © i?Ti-[3m](^) ^ *"^[3m](-'^ © Y). 

\F2n+2+k{L) ® F2n+2+k{L*)\ 

= \tm[3^]{{ui,vi,wi, . . . Uk-i,Vk-i,Wk-i}) © tm[3rn]{{u*i,vl,wl, . . . ul^^,vl_^,wl_^})\ 

< \tm[sjn]{{ui,vi,wi, . . .Uk^i,Vk-i,Wk-i} © {ul,vl,wl,...ul_i,vl^i,wl^i})\ 

^ g Qj \2 9d 3n + 3 3 9d 37m 3 9^7 3 

< 3 •3a •(3m) = (3m) = (3m) = (3m) 

^ ' 3n + 3 3m ^ ^ 3n + 3 3m ^ ^ 3n + 3 ^ ^ 

Therefore, for every i = 1, . . . 3n, \Fi{L) © Fi(L*)| < ^\Xi\. 



Lemma 23 Xei J > 6e given, and let m 6e an integer > 36/6. Let (V/,V//) be a partition of 
MVarSm, with (5(V/,V//) > 5. Let D he as in the Proof of LemmaMSX Let G, N2 and N3 be as 
in Definition \8.1[ Let U be the uniform distribution over in+i^ii € [m], {jn+i,i^ in+1,2^ jn+ifi) S 
[2m+l]^, and ji^i, 31^2) G [2m+l]2. Let A he the event thatin+i £ G,ii £ G, (jn+1,1, jn+1,2, Jn+1,3) S 
N^{ip), and {jn+1,2, ji,i, ji,2) G Ns{ii). 

^D-xa] > S{Vi,Vn)/2 

Proof: Let Bq be the event that either ji^i = ji^2, jn+1,1 = jn+1,2, jn+1,2 = jn+1,3, jn+1,3 = jn+1,1, 
ji,i = jn+1,2, or ji^i = jn+1,2- Let Bi be the event that in+i G, let B2 be the event that 
ii G, let i?3 be the event that {jn+1,1, jn+1,2, jn+1,3) N^{in+i), and let B^ be the event that 
{jn+1,2, ji,i, ji,2) N2,{ii). For each i = 0, ... 4, let B^ = BiH n}=o ^j- Because the i?*'s partition 
we have that: 

4 

Eu[D] = Eu[D • xa] + ^mF> ■ Xb:] 

2 = 
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Set 5* = 5{Vi,Vn)- The calculations below show that U{Bq) < 6/(2m + 1) and for each 
z = 1, . . . 4, Eu[D ■ xb*] < {56* /12)U{B*). Modulo those calculations, we have the lemma: 

4 4 

Eu[D-xa] = Eu[D]-J2^u[D-xb:]>S* -6/{2m + l)-Y.(^5*/12)U{B*) 

> 6* - 55712 - 6/(2m + 1) > 75/12 - 6/(2(36/5)) = 75712 - 5/12 > 572 

For each of the six pairs and ji^2, jn+is and jn+1.2, jn+1,2 and jn+1,3, jn+1,3 and j„+i,i, 
jn+1,2 and ^, and jn+1,2 and ji^2, there is a collision with probability l/(2m + 1). Therefore by 
the union bound, U{Bq) = U{Bq) < 6/ (2m + 1). We now bound the expectation over the pieces 
Bl, B2, -B3, and Because these events are contained in Bq, for elements drawn from these sets, 
the tuples {ji,i,ji,2), {jn+1,1, jn+1,2, jn+1,3), and {jn+1,2, jii, ji,2) each contain distinct elements. To 
denote this, we will use Z to denote the set of pairs tuples {jn~+i,fl) with jn / ji^2, jn+1.1 7^ jn+1,2, 
jn+1,2 / jn+1,3, Jn+1,3 / Jn+1,1, jn+1,1 / j^i, and jn+1,1 / ji,2, let [2m + 1]2 denote ah ordered 
pairs from [2m +1] with two distinct values and let [2m + 1)3 denote all ordered triples from [2m +1] 
with three distinct values. Finally, set M = m^(2m + 1)^, 



^u[D-XBl] = ^ ^ Yl Yl D{in+l,il,Jn+l,fl) 
in+i^G iie[m] {jn+i,fi)&Z 

- J4 12 D{in+l,il,Jn+l,fl) 

in+i^G iie[m] Jn+ie[2"T,+i]3 

i;e[2m+l]2 

= Y Y Y D{in+i,iu3n+i,3i) 

in+i0Gj„+ie[2m+l]3 'iSM 

i]e[2m + l]2 

^ Y Y Y ^i'^n+l,il,Jn+l,fl) 

3le[2m+l]2 

+ Y Y Y D{in+i,iu3n+l,Tl) 

in+i jn+i e [2m+l]3\A''3 (in+i ) H e l""] 

iie[2m+l]2 

£ F E E E 1 + F E E E W3) 

in+i0G:;„+i6Ar3(j„+i) »ie[m] in+i0G j„+ie[2m+l]3\Ar3(i„+i) i!6[™] 

J7e[2m + ll2 j|e[2m + l]2 



= ]^ E E m(2m + l)2 + ^ J2 Y (5/3)m(2m + l) 

i„+i j„+i e A^a («n+i ) «n+i jn+i e [2m+l]3\A'3 («n+l ) 

- ]^ E (Vl2)(2m + l)2m(2m + l)2 + -l: ^ (2m + l)3(5/3)m(2m + 1)^ 

= Y (Vl2 + 5/3)(2m + l)3m(2m + l)2 = (55/12)[/(SJ') 



M 



To bound Eij[D ■ Xbj] we need first show that for all in+i,ii G [m], all jn+i G [2"t. + 1]3, and all 
fl G [2m + 1]2 \ iV2(i;) L'(i„+i, i;, jl) < 5/3. To see this choose j* G {jn+1,1, jn+1,2, in+1,3} \ 
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{jl,i,jl,2} and calculate: 



D{in+l,il,ln+l,fl) 



^ |-"«U' ' ' '31,1 ' ' 'J Jl ^ r /o 

^ 73;^i^ 



XB^] = ]^ X] X] X D(in+l,il,Jn+l,fl) 
in+i&G ii^G (jn+i j7)G^ 

^ i?(Wl,^^Jn+l,j7) 

+ X X X X £'(Wl,^^Jn+l,j7) 

i„+i eG i; j7 e [2m+ 1] 2 \ W2 (ii ) e [2"i+ 1] 3 



< 



i„+i6Gii0Gj;6Ar2(j;) j„+ie[2m+l]3 in+iSG ii^G j;e[2m+l]2\A''2(ji) Jn+ie[2m+l]3 



^ E E(Vl2)(2m + l)^ + i-^(V3)(2m + l)^ = (5Vl2)C/(S2*(Wi)) 



M ^ ' ' M 

in+ieGii^G ii^G 

To bound Eu[D ■ xb^], note that for all {in+i,ii, Jn+i, fl) G ^3, because (jn+i,i,in+i,2, Jn+1,3) £ 
[2m + 1]3 \ iV3(Wi): 

12] 

^ \-Ein+l[Vjn+l,l ^ ^n+1,2 ^n+l.sll ^ X/o 

Therefore Eu[D ■ xb*] < {S/3)UiB^). 

Similarly, to bound E[7[-D-xb|], observe that for all {in+i,ii,jn+i,fl) ^ ^t, because {jn+i,2,ji,i,ji,2) € 
[2m + l]3\N3{iiy. 

[ 2 ) 

< ^ -<V3 

Therefore Eu[D ■ xm] < {S/3)UiBl). 
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